0 92011 I! 



IN THE UNITED STATES PATENT AND TRADEMARK OFFICE 



Confirmation No.: 9891 
Art Unit: 1639 

Examiner: WESSENDORF, Teresa D. 
Atty. Docket: 268L0030002/RWE/JKM 



In re application of: 

SNYDER era/. 

Application No.: 09/849,781 

Filed: May 4, 2001 

For: Protein Chips for High 

Throughput Screening of Protein 
Activity 

Supplemental Declaration of Barry Schweitzer, Ph.D. 
Under 37 C.F.R. § 1.132 

Attn: Mail Stop Amendment 

Commissioner for Patents 
PO Box 1450 

Alexandria, VA 22313-1450 
Sir: 

I, Dr. Barry Schweitzer, residing at 459 Maple Avenue, Cheshire, CT, USA, do 
hereby declare and state as follows: 

1. I am currently employed by Life Technologies Inc. ("LTI"), a licensee of 
the above-captioned application. I hold the position of Director of Integrated 
Technologies, Molecular Biology Systems Division. A copy of my curriculum vitae is 
attached hereto as Exhibit 1. I received my Ph.D. degree in Pharmacology from Yale 
University. As indicated by my attached curriculum vitae, I have published many papers 
relating to protein microarrays. Based on my education and experience, I am an expert 
in the fields of genomics, molecular genetics, and proteomics (including protein 
microarraying). 

2. I have read and understand the following documents: 

• U.S. Application No. 09/849,781 ("the 781 application"; Exhibit 2) 

• Pending claims (Exhibit 3) 

• Office Action mailed July 7, 2009 ("the first Office Action"; Exhibit 4) 

• Office Action mailed July 9, 2010 ("the second Office Action"; Exhibit 5) 
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• Declaration entitled "Declaration of Barry Schweitzer, Ph.D., Under 37 
C.F.R. §1.132" with Exhibits A-D, dated March 2, 2010 ("my first 
Declaration"; Exhibit 6) 

• Hanks et al., FASEB J. 9:576-596 (1995) ("Hanksl"; Exhibit 7) 

• Hanks et al.. Science 241 :42-52 (1988) ("Hanks2"; Exhibit 8) 

• Bold et al., Mol. Cell. Biol. 5(1 1) 331-338 (1985) ("Bold"; Exhibit 9) 

• Yaciuk et al., Mol. Cell. Biol. 6(8):2807-2819 (1986) ("Yaciuk"; Exhibit 10) 

• Sadowski et al.. Oncogene 1:181-191 (1987) ("Sadowski"; Exhibit 1 1) 

• Hubbard et al.. Nature 372 (6508):746-754 (1 994) (Hubbard; Exhibit 1 2) 

• Hanks et al.. Genome Biology 4:1 1 1-1 1 1.7 (2003) ("Hanks3"; Exhibit 13) 

• Hunter and Plowman TIBS 22:18-22 (1997) ("Hunter and Plowman"; Exhibit 
14) 

• Plowman et al., Proc. Natl. Acad. Sci. 96:13603-13610 (1999) ("Plowman"; 
Exhibit 15) 

• Morrison et al., J. Biol. Chem. 150(2):F57-F62 (2000) ("Morrison"; Exhibit 
16) 

• Manning et al.. Science 298:1912-1934 (2002) ("Manning"; Exhibit 17) 

• Shaw et al.. Drug Discovery and Development (2005) ("Shaw"; Exhibit 1 8) 

• Zhu et al.. Nature Genetics 26:283-289 (2000) ("Zhu"; Exhibit 19) 

• Tleugabulova et al., J. Chrom. B. 720:153-163 (1998) ("Tleugabulova"; 
Exhibit 20) 

• Groll et al., J. Am. Chem. Soc. 126:4234-4239 (2004) ("Groll"; Exhibit 21) 

• Bussow et al.. Nucleic Acids Res. 26(21):5007-5008 (1998) ("Bussowl"; 
Exhibit 22) 

• Bussow et al., Genomics 65:1 1-8 (2000) ("Bussow2"; Exhibit 23) 

3. In formulating my opinions set forth in this declaration, I have considered 
the viewpoint of a scientist of ordinary skill in the field of proteomics, as of May 4, 
2001, the filing date of the '781 application. I understand that a scientist of ordinary skill 
in the field of proteomics is a hypothetical scientist who thinks along conventional 
wisdom in the field of proteomics, and is a scientist of ordinary creativity who does not 
seek to innovate. A scientist of ordinary skill in the field of proteomics would have had 
general knowledge of the scientific literature concerning proteomics (including protein 
microarraying) technology that was available by May 4, 2001, including knowledge 
about the identification and characterization of protein kinases and also experimental 



techniques and results available in the art. A scientist of ordinary skill in the field of 
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proteomics would have a scientific background and hold a Master's degree or Ph.D. in 
the biological and/or physical sciences (e.g., pharmacology), and have substantial 
familiarity, training, and experience with proteomics (including protein microarraying). 
It is my understanding that a scientist of ordinary skill in the field of proteomics is a 
hypothetical scientist, whereas an expert in the field of proteomics is an actual scientist. 
I believe that a scientist of ordinary skill in the field of proteomics would agree with my 
opinions expressed in this declaration. 

4. As an expert in the field of proteomics technology since before 2001 , 1 am 
qualified to provide an opinion as to what a scientist of ordinary skill in the field of 
proteomics would have known and concluded as of May 4, 2001 . 

s. 

5. I understand that claim 1 of the *781 application is directed to a 
positionally addressable array comprising a plurality of different substances on a solid 
support, with each different substance being at a different position on the solid support, 
wherein the density of the different substances on the solid support is at least 100 
different substances per cm^, and wherein the plurality of different substances comprises 
61 purified active kinases (or fionctional kinase domains thereof) of a mammal, 61 
purified active kinases (or functional kinase domains thereof) of a yeast, or 61 purified 
active kinases (or functional kinase domains thereof) of a Drosophila} 

6. I also understand that claim 1 and the claims that depend from claim 1, 
have been rejected by the Patent Office for allegedly failing to comply with requirements 
for written description and enablement. 

7. I further understand the Patent Office has taken the position that the 781 
application, which includes the exemplary disclosure of arrays containing 111 distinct 



CI 
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purified active yeast protein kinases, fails to provide an adequate description that a 
scientist of ordinary skill would have reasonably believed to be generally applicable to 
distinguishing protein kinases and to predictably arraying at least 61 purified active 
protein kinases (or fragments containing a functional kinase domain) of any organism, 
such as a yeast, a Drosophila or a mammal.^ The Patent Office has also argued that 
since biotechnology is unpredictable, the *781 application does not provide sufficient 
disclosure to teach a scientist of ordinary skill how to make and use the full scope of the 
claimed protein arrays containing at least 61 purified active protein kinases (or fragments 
containing a functional kinase domain) from an organism, such as a mammal, a yeast, or 
a Drosophila, without having to imdertake inordinate experimentation.^ 

8. I have reviewed the Patent Office's statements relating to its bases for 
making these rejections, and I have concluded that the rejections are founded on flawed 
and/or xuisubstantiated reasoning that lead to incorrect conclusions. More particularly, 
the Patent Office's rejections relating to written description and enablement are founded 
on conclusory arguments that overlook the disclosure and guidance provided by the '781 
application and the extensive knowledge of scientists of ordinary skill in the field of 
proteomics on May 4, 2001. For at least the reasons discussed below, it is my opinion 
that the disclosure of the '781 application would indeed have reasonably been understood 
by a scientist of ordinary skill in the field of proteomics on May 4, 2001 to encompass 
and provide ample disclosure of the claimed protein kinase arrays, which contain at least 
61 purified active protein kinases (including fragments containing a functional kinase 



* See Pending claims (Exhibit 3). 

^ Second Office Action at pages 3-10 (Exhibit 5). 

^ Second Office Action at pages 1 1-20 (Exhibit 5). 
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domain) of a yeast or another organism, such as a Drosophila or a mammal (e.g., a 
human), as well as disclosure and guidance that would have been reasonably expected to 
be generally applicable to making and using these arrays. It is also my opinion that a 
scientist of ordinary skill in the field of proteomics on May 4, 2001, enlightened by the 
disclosure and guidance of the 781 application, would have reasonably expected to be 
able to, and indeed would have been able to, make and use the protein kinase arrays that 
fall within the scope of claim 1 without having to undertake more than routine 
experimentation. A discussion relating to the written description and enablement 
rejections are addressed in tum below. 

Disclosure of the '781 application 

9. Well before May 4, 2001 , protein kinases were known to represent a large 
superfamily made up of hundreds of proteins that were assigned to this superfamily by 
virtue of their containing a kinase domain.^ For example, by May "about 200 

different superfamily members (products of distinct paralogous genes) had been 
recognized from mammalian sources alone Moreover, protein kinase domains were 
known to consist of discrete polypeptide regions of approximately 250-300 amino acid 
residues and to contain characteristic patterns of conserved residues, including twelve 
invariant or nearly invariant residues.^ The conserved patterns and invariant residues in 

^ See Pending claims (Exhibit 3). 

^ See e.g., the 781 application at page 9, lines 1-8; page 31, lines 8-17; and 
Figures 5a and 5b (Exhibit 2). See also, e.g., Hanksl (Exhibit 7) referred to on page 31, 
line 15, and page 41, lines 11-12 of the 781 application (Exhibit 2), which has been 
incorporated by reference (page 46, lines 14-17). 

^ Hanksl, at page 576, column 2, first paragraph (Exhibit 7). 

^ See e.g., Hanks2 at page 42 and pages 45-46 (Exhibit 8); Bold (Exhibit 9); 
Yaciuk (Exhibit 10); and Sadowski (Exhibit 11). 
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these kinase domains were known to play essential roles in enzyme function, as 
corroborated by the crystal structure of several protein kinase superfamily members.^ By 
May 4, 2001, alignments and characterization studies of scores of protein kinases and 
their protein kinase domains had identified differences and alterations in kinase domain 
sequences that retained, abolished, and distinguished kinase activity.^ These studies also 
identified and characterized distinct conserved sequence motifs in a relative handful of 
proteins that had been reported to have kinase activity, but that lacked a conventional 
kinase domain.*^ Moreover, since from well before May 4, 2001 to present, sequence 
analysis has routinely been applied to reliably identify and distinguish protein kinases.^ ^ 
Thus, on May 4, 2001, a scientist of ordinary skill in the field of proteomics would have 
readily comprehended that protein kinases contain a highly conserved catalytic kinase 
domain for which there was an art-recognized correlation between primary amino acid 
sequence (i.e., structure) and kinase activity (i.e., fimction), and that protein kinases were 



See e.g., Hanksl at page 576-577, and page 588, col.l to page 592 col. 2, which 
discuss the crystal structures of the protein kinases PKA-Calpha, Erk2, twichin kinase, 
casein kinase I, Cdk2, and the insulin receptor; see also note added in proof at page 592, 
col. 2 (Exhibit 7); see also Hubbard (Exhibit 12), 

^ See for example, initial screening of the yeast genome by Hunter and Plowman 
at page 14 (Exhibit 14), which has been incorporated by reference in the '781 application 
(page 46, lines 14-17 (Exhibit 2)). 

See, e.g., Plowman at page 13609 and Table 1 (Exhibit 15) which has been 
incorporated by reference in the 781 application (page 46, lines 14-17 (Exhibit 2)); 
Hanks3 at page 1 1 1.2, col. 1 first fiill paragraph (Exhibit 13); and Morrison at page F59 
(Exhibit 11). 

* ' For example, sequence analysis was relied to identify protein kinase members 
in the initial screening of the yeast genome by Hunter and Plowman (Exhibit 14; referred 
to on page 34, line 2 of the '781 application (Exhibit 2)), the Drosophila genome by 
Morrison (Exhibit 16), and the human genome by Manning (Exhibit 17), were all 
initially identified using homology based analysis. See also Plowman at pages 13604- 
13608 and Table 1 1 (Exhibit 15; referred to on page 27, lines 27-29, and incorporated by 
reference at page 46, lines 14-17 of the '781 application (Exhibit 2)). 
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and could be, reliably recognized and distinguished based on the sequence of their kinase 
domain. 

10. Against this backdrop, the 781 application refers to a screen of the yeast 
genome conducted by Hunter and Plowman that identified 122 open reading frames 
predicted to encode the protein kinases of the yeast genome (i.e., the yeast kinome). 
Example 1, at pages 27-41 of the 781 application, discloses the manufacture and 
screening of arrays containing 119 of the 122 yeast protein kinases identified by Hunter 
and Plowman and reports that "most (i.e., 93% [111 kinases]) kinases" on the arrays 
exhibit protein kinase activity. Thus, Example 1 of the 781 application confirms that 
on May 4, 2001, protein kinases were readily and reliably recognized and distinguished 
based on the primary amino acid sequence of their catalytic kinase domain. 

11. The 781 application describes arrays of protein kinases and fragments 
containing a functional kinase domain from a yeast and other organisms, including a 
mammal and a Drosophila}^ The 781 application also discloses the production of 17 
protein chip arrays containing 111 distinct purified active yeast protein kinases (i.e., 93% 
of the yeast kinome) and states that these arrays are intended to be exemplary and non- 
limiting.^^ More than 90% (i.e., 1 1 1/119) of the protein kinases on the arrays disclosed 
in Example 1 are reported to display kinase activity. The active arrayed proteins 

12 

781 application at page 27, line 32 to page 28, line 5; and page 31, lines 6-17 
(Exhibit 2); Hunter and Plovraian (Exhibit 14), which has been incorporated by reference 
in the 781 application (page 46, lines 14-17 (Exhibit 3)). 

See 781 application at page 8, lines 26-30; and page 33, line 34-35; and Figure 
4a (Exhibit 2). 

See e.g., 781 application at page 11, second full paragraph (Exhibit 2). 
See e.g., 781 application at page 33, lines 14-36 (Exhibit 2). 
See e.g., 781 application at page 28, lines 9-14 (Exhibit 2). 



- 8 - SNYDER et al 

AppLNo. 09/849,781 

include 18 of the 24 previously unstudied yeast protein kinases, and unconventional 
protein kinases such as, histidine kinases (Slnl, Yil042c) and phospholipid kinases (e.g., 
Mecl).^^ Example 1 also discloses that 27 of the arrayed kinases display tyrosine kinase 
activity and are able to phosphorylate the substrate poly(Tyr-Glu).^^ Several of the 
kinase-substrate activities disclosed in Example 1 are reported to correspond to known 
kinase-substrate relationships and the '781 states that similarly, the other substrates 
identified in Example 1 are likely to be bona fide substrates for their identified 
counterpart protein kinase(s) in v/vo.*^ 

12. In view of the extensive disclosure and teaching of the '781 application 
and the art-recognized correlation between the primary sequence and activity of the 
kinase domain, a scientist of ordinary skill in the field of proteomics reading the '781 
application on May 4, 2001, would readily have been able to recognize and distinguish 
protein kinases and would have reasonably concluded that the 781 application provides 
ample description of arrays containing at least 61 purified active protein kinases 
(including fragments having a functional kinase domain) from a yeast, or another 
organism, such as a Drosophila, or a mammal (including a human). In particular, a 
scientist of ordinary skill, enlightened by the disclosure and teaching of the '781 
application, would have readily understood that the disclosure and teaching in the 
application applies to the production of equally successful arrays containing active 
protein kinases from other organisms (e.g., a Drosophila or a mammal) that could readily 
be recognized and distinguished from other proteins and that could be arrayed as active 

'781 application at page 34, lines 1-4 (Exhibit 2). 
'781 application at page 34, lines 27-36 (Exhibit 2). 
'781 application at page 36, lines 4-9 (Exhibit 2). 
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proteins according to the methods and techniques disclosed in the 781 application, 
irrespective of whether the arrayed kinases were well characterized or uncharacterized, 
or whether the kinases were then known or yet to be recognized. For at least these 
reasons, it is my opinion that a scientist of ordinary skill in the field of proteomics on 
May 4, 2001, reading the 781 application, would have reasonably concluded that the 
appHcation amply describes the claimed arrays which contain at least 61 purified active 
kinases (and firagments containing a functional kinase domain) of a yeast, a Drosophila 
or a mammal (e.g., a human). 

13. In the second Office Action, the Patent Office states that the disclosure of 
the 781 application is limited to that of "a single species" of the claimed genus.'^^ In 
response, I point out that the 781 application discloses the production of 17 arrays 
containing 111 distinct and diverse purified active yeast protein kinases - well over the 
lower limit of 61 active kinases recited in the pending claims. Additionally, the high 
percentage of the large number of yeast protein kinases that display kinase activity on the 
arrays prepared according to the methods and techniques disclosed in the 781 
application (i.e., almost every arrayed protein corresponding to almost every kinase in 
the yeast kinome) provide compelling support that these methods and techniques could 
be applied to routinely and predictably produce arrays containing at least 61 active 
protein kinases (and fi-agments containing a functional kinase domain) fi-om an organism, 
such as a yeast, a Drosophila^ or a mammal. Moreover, the disclosure in Example 1 of 
the production and use of 1 7 arrays that each display 111 active yeast protein provides 
further support that would lead a scientist of ordinary skill in the field of proteomics to 

20 

The Second Office Action, paragraph spanning pages 4-5 (Exhibit 5) 
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reasonably conclude that the techniques and methodology described in the 781 
application are reproducible, generally applicable to actively arraying large numbers of 
active diverse protein kinases on a single array, and can be routinely used or adapted to 
produce and use arrays containing at least 61 active protein kinases (and fragments 
containing a functional kinase domain) from, for example, a yeast, a human or another 
mammal, or a Drosophila, 

14. Additionally, it is significant to note that many of the statements made by 
the Patent Office in support of the written description rejection are unsubstantiated 
and/or non-persuasive in view of the state of the art and the teaching and disclosure of 
the '781 application. In particular, the Patent Office speculates that "a skilled artisan 
recognizes that one caimot rule out the possibility that" (a) it might be difficult to array 
poorly characterized protein kinase family members without denaturing them; (b) 
kinases other than the desired enzyme can contaminate the purification preparations; and 
(c) the kind and type of substrate may also be a factor that wdll influence the activity of 
an arrayed protein kinase.^* 

15. In response, I point out that, as would have been immediately apparent to 
a scientist of ordinary skill reading the '781 application, the presence of poorly 
characterized protein kinases, the possibility of contamination, and the lack of 
knowledge relating to substrate identity did not prevent the successful production and 
use of the arrays exemplified in Example 1 . In particular, the 78 1 application, at page 
27, lines 34-35 and page 34, lines 1-4, discloses that 75% (i.e., 18/24) of the arrayed 
proteins that had not been previously studied (i.e., were "poorly characterized") 

Second Office Action at page 5, quoting statement in the '781 application at 
page 36, line 10 (Exhibit 4). 
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displayed kinase activity and phosphorylated one or more substrates. With respect to the 
potential for sample contamination, the 781 application at page 35, lines 29-33, states 
that "[o]ne concem with these studies is that it is possible that kinases other than the 
desired enzyme are contaminating our preparations. Although this cannot be rigorously 
ruled out, analysis of five of our samples by Coomasie staining and immimoblot staining 
with anti-GST does not reveal any detectable bands in our preparation that are not GST 
fusions (see methods)." Therefore, contamination would not appear to have prevented 
the successful production of the arrays containing active protein kinases in Example 1 of 
the 781 application. Lastly, with respect to substrate identity, the high percentage of 
arrayed protein kinases that demonstrate kinase activity in Example 1 (i.e., 93%), 
indicates that the kind and type of substrate did not prevent the successful manufacture 
and use of arrays containing purified active kinases according to the methods disclosed 
in the 781 application. Accordingly, poorly characterized protein kinases, 
contamination, and substrate anonymity did not prevent the successful production and 
use of the 1 1 arrays containing the purified active protein kinases disclosed in Example 
1. Moreover, in view of the detailed disclosure and teaching of the 781 application and 
the high level of knowledge and skill in the field of proteomics on May 4, 2001, a 
scientist of ordinary skill would have reasonably expected that the reagents, 
methodology, and techniques disclosed and taught in the 781 application could be 
routinely, reliably, and predictably applied to produce protein arrays containing at least 
61 purified active proteins kinases from yeast and other organisms, such as, a 
Drosophila, or a human or other mammal. 



'781 application at page 34, lines 27-36; and page 36, lines 4-9 (Exhibit 2). 
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16. The Patent Office additionally emphasizes the statement in the 781 
application that "although most of the kinases were active in [our] assays, several were 
not."^^ In response, I point out that the Patent Office's choice to emphasize this 
statement reflects its failure to appreciate the fiill extent of the disclosure and teaching of 
the 781 application in its entirety, as it would have been understood by a scientist of 
ordinary skill in the field of proteomics on May 4, 2001. In particular, this misplaced 
emphasis overlooks the fact that the *781 application discloses the production and use of 
17 arrays on which almost every protein kinase in the kinome of an organism (i.e., yeast) 
displays kinase activity. The high percentage (i.e., greater than 90%) of the large number 
of distinct and diverse arrayed protein kinases (i.e., 121) that are active on the kinase 
arrays (i.e.. Ill) described in Example 1 would have been viewed on May 4, 2001 by a 
scientist of ordinary skill in the field of proteomics to be a highly successful experiment 
that represented a substantial technological advancement in protein arraying. 

17. Additionally, in the second Office Action at pages 15-16, the Patent 
Office relies on statements made in Shaw^"^ to support the position that "proteins have 
proven to be much trickier to work with in array format than their genomic counterparts" 
and that protein arraying is unpredictable due to issues such as stability, the protein 
arraying technique utilized, and non-specific binding. In response, while I agree that 
producing protein arrays might generally be viewed as challenging, particularly, when 
compared to producing DNA microarrays, I disagree that this viewpoint would have 
prevented a scientist of ordinary skill in art of proteomics on May 4, 2001, enlightened 
by the disclosure and guidance provided by the '781 application, fi:'om reasonably 

Second Office Action at page 15 (Exhibit 5). 
Shaw (Exhibit 18). 
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concluding that the '781 application adequately describes the making and using of 
protein arrays containing at least 61 active protein kinases from yeast or another 
organism, including a Drosophila or a mammal, such as a human (including fragments 
having a fiinctional kinase domain of these protein kinases). In particular, the high 
percentage (i.e., greater than 90%) of the large number of distinct protein kinases that are 
active on the arrays disclosed in Example 1 (i.e.. Ill) indicates that protein stability, 
choice of immobilization technique and non-specific binding did not prevent the 
successful arraying and use of protein kinase arrays prepared according to the disclosure 
and teaching of the '781 application. Furthermore, in view of the known correlation 
between the primary sequence and activity of the kinase domain and the general 
applicability of the methods disclosed and taught in the *781 application for 
recombinantly expressing, purifying, and arraying a large number of distinct and diverse 
active protein kinases from other organisms, a scientist of ordinary skill in the field of 
proteomics would have reasonably expected that the methods, techniques and reagents 
disclosed in the '781 application could be used to produce equally successftil arrays 
containing at least 61 purified active protein kinases (or fragments containing a 
functional kinase domain) of yeast or another organism, such as a Drosophila or a 
mammal (e.g., a human). 

Enablement Rejections of the *781 application 

18. As discussed above, prior to May 4, 2001, there was an art-recognized 
correlation between the primary sequence (i.e., structure) and kinase activity (i.e., 
function) of the catalytic domain of protein kinases. Kinase domains were known to 
represent discreet regions containing characteristic patterns of conserved and invariant or 
nearly invariant amino acid residues that play essential roles in conferring kinase 
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activity. Well before May 4, 2001, scientists of ordinary skill in the field of proteomics 
relied on this known correlation between the sequence and activity of kinase domains to 
reliably recognize and distinguish protein kinases using primary sequence analyses.^^ 
The use of sequence analysis to identify protein kinases had been validated through 
numerous activity studies and is further corroborated by the disclosure in Example 1 of 
the '781 application, which demonstrates that almost every yeast protein kinase that was 
identified by Hxmter and Plowman based on primary sequence analysis, displays kinase 
activity. Accordingly, a scientist of ordinary skill in the field of proteomics on May 4, 
2001 would have reasonably expected that protein kinases and fi-agments containing a 
fimctional kinase domain (including fiilly characterized protein kinases, poorly 
characterized protein kinases, or proteins predicted to have kinase activity based on 
deduced polypeptide sequence) would display kinase activity. 

19 The '781 application discloses the recombinant production of chimeric 
fusion proteins containing a tag (e.g., glutathione-S-transferase (GST)) fused to a large 
number of protein kinases (e.g., corresponding to almost every protein kinase in the 
kinome of an organism (e.g., Saccharomyces cerevisiae)); the use of a reagent having 
affinity for this tag component (e.g., the affinity of glutathione for the GST tag) to 
rapidly and efficiently purify the chimeric kinase proteins from host cell lysates at low 
temperatures (4°C) and under non-denaturing conditions; the design, manufacture and 
optimization of solid supports and linking agents; and the immobilization and arraying of 



For example, sequence analysis was relied to identify protein kinase members 
in the initial screening of the yeast genome by Hunter and Plowman (Exhibit 14; referred 
to on page 34, line 2 of the 781 application (Exhibit 2)), the Drosophila genome by 
Morrison (Exhibit 16), and the human genome by Manning (Exhibit 17), were all 
initially identified using homology based analysis. See also Plowman at pages 13604- 
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the chimeric protein kinases onto the solid support in a manner so as to retain protein 
kinase activity, as well as the subsequent assaying of the arrayed protein kinases for 
kinase activity. In Example 1, the methods and teachings disclosed in the '781 
application are applied to produce 17 exemplary'^^ arrays, each containing 111 distinct 
purified active yeast protein kinases that represent almost every protein in the yeast 
kinome. More than 90% of the proteins arrayed in Example 1 display kinase activity 
including 1 8 uncharacterized yeast protein kinases, and unconventional kinases such as, 
histidine kinases and phospholipid kinases?^ Example 1 also discloses that 27 of the 
arrayed kinases display tyrosine kinase activity. The '781 application additionally 
reports that several of the kinase-substrate relationships reported in Example 1, 
correspond to knovm phosphorylation interactions in vivo?^ 

20. In view of the extensive disclosure, teaching and guidance of the '781 
application and the high level of knowledge and skill in the field of proteomics on May 
4, 2001, a scientist of ordinary skill, having read the 781 application, would have 
reasonably concluded that the disclosure and teaching of the '781 application could be 
routinely applied or modified to produce arrays containing at least 61 purified active 
protein kinases (including fragments having a functional kinase domain) from an 

13608 and Table 11 (Exhibit 15; referred to on page 27, lines 27-29 of the 781 
application (Exhibit 2)). 

26 

See e.g., 781 application at page 26, line 25, through page 27, line 19; and 
page 28, line 3 to page 37, line 34 (Exhibit 2). 

781 application at page 33, lines 14-15 (Exhibit 2). 

781 application at page 28, lines 12-14 (Exhibit 2). 

'781 application at page 34, lines 1-4 (Exhibit 2). 

'781 application at page 34, lines 27-36 (Exhibit 2). 

'781 application, at page 36, lines 4-9 (Exhibit 2). 
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organism, such as from a mammal, a yeast, or a Drosophila. Moreover, in view of the 
disclosure and teaching of the '781 application, a scientist of ordinary skill would have 
reasonably come to this same conclusions irrespective of whether the arrayed protein 
kinases are: (a) from a yeast, a Drosophila, or from a human or any other mammal; (b) 
fixUy characterized, poorly characterized; or yet to be identified; or (c) full-length, a 
fragment containing a functional kinase domain, or a polypeptide predicted to have 
protein kinase activity based on its deduced amino acid sequence. 

21 . On pages 12-1 3 of the second Office Action, the Patent Office states: 

[i]n a highly unpredictable art, as biotechnology, where one cannot 
predict whether one species would be predictive to the huge scope of the 
claims, one cannot make a priori statement without any experimental 
studies. Factors such as the compatibility of the array with the substrates 
and compoimds disposed therein, the compoimds (kinases) itself and other 
unpredictable variables can affect the active form of any kinase. Thus, 
one cannot predict from a single species its correspondence or 
extrapolation to the genus, as claimed. 

In response, I point out that this statement reflects the Patent Office's failure to fiiUy 

appreciate the state of the knowledge relating to protein kinases on May 4, 2001, the 

level of skill of scientists of ordinary skill on this date, and the extent to which these 

scientists would have understood the disclosure, teaching, and guidance of the '781 

application to be generally applicable to, and predictive of, the ability to routinely make 

and use the claimed arrays containing at least 61 purified active proteins kinases of a 

yeast, a Drosophila or a mammal (including fragments having a functional kinase 

domain). 

22. In particular, the '781 application discloses methods and techniques that 
are demonstrated to successfiilly produce protein arrays containing a surprisingly high 
percentage (i.e., greater than 90%) of active, distinct and diverse protein kinases 
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(representing "nearly all" of the protein kinases in the yeast kinome)."^^ Given the 
immediately apparent general applicability of the methods disclosed in the 781 
application for recombinantly expressing, purifying, and arraying a large number of 
active protein kinases, a scientist of ordinary skill in the field of proteomics would have 
reasonably expected, and correctly so, that the methods and techniques disclosed in the 
'781 application could be used or routinely modified to produce equally successful 
arrays containing at least 61 purified active protein kinases (or fi-agments containing a 
functional kinase domain) fi*om an organism, such as a Drosophila, a yeast, or a human 
or another mammal. 

23. Moreover, the Patent Office's speculation that factors such as, substrate 
and array compatibility, distinctions between kinases, and "other vmpredictable 
variables" make the manufacture of the claimed protein kinase arrays unpredictable is 
baseless in view of the disclosure and teaching of the '781 application, which includes 
the successful production of the active kinase protein arrays reported in Example 1 . In 
particular, as would be immediately apparent to a scientist of ordinary skill in the field of 
proteomics on May 4, 2001, substrate and array compatibility, distinctions between 
kinases, and "other unpredictable variables" did not prevent the successful arraying of 
the purified active yeast protein kinases using the methods and techniques disclosed in 
the '781 application. Moreover, in my opinion, the Patent Office has provided no 
reasonable or compelling basis that would have led a scientist of ordinary skill in the 
field of proteomics on May 4, 2001, enlightened by the disclosure and teaching of the 
'781 application, to disregard or doubt that the generally applicable disclosure and 

'781 application at page 28, lines 3-14 (Exhibit 2), 
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teachings of the '781 application could be equally successfully applied to array purified 
active protein kinases from yeast or from other organisms such as a Drosophila, or a 
mammal. In particular, I point out that the Patent Office's conclusory and unclearly 
articulated statement that "the fact remains that purification the technique [sic] and other 
experimental conditions/steps for yeast would be different from any type of mammals"^^ 
is inconsequential and unsubstantiated in view of the advanced state of the art, the high 
level of skill in the field of proteomics on May 4, 2001, and the detailed disclosure and 
teaching of the '781 application discussed herein. For example, the specification 
discloses that the protein kinases can be recombinantly expressed as fusion proteins 
containing tags (e.g., glutathione-S-transferase (GST)) and that these fusion proteins can 
be rapidly purified and arrayed using reagents having affinity for such tags (e.g., 
glutathione).^"^ As would be immediately apparent to a scientist of ordinary skill, the use 
of the approaches disclosed in the 781 application for recombinantly expressing and 
purifying tagged protein kinases using affinity reagents would be expected to be 
generally applicable to recombinantly expressing, purifying, and arraying protein kinases 
from any organism. Moreover, I disagree with the Patent Office and in my opinion, a 
scientist of ordinary skill, reading the disclosure of the '781 application, particularly 
Example 1, would have reasonably concluded that the disclosed methods and techniques 
for producing and arraying large numbers of active purified protein kinases, including for 
example, methods and techniques for recombinantly producing, purifying and arraying 
active protein kinases, would be expected to be equally applicable to, and would require 

The Second Office Action at page 16, first full paragraph (Exhibit 5) 
See, e.g., 781 application at page 17, lines 12-17; and page 32, lines 8-22 

(Exhibit 2). 
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at most routine modification for, successfully producing arrays containing active purified 
protein kinases from yeast and other organisms, including for example, a mammal (e.g., 
a human), or a Drosophila. 

24. As I have previously testified, v^hen I first read Dr. Michael Snyder's 
journal publication corresponding to the '781 application (i.e., Zhu; Exhibit 19) on or 
around its publication date, I was surprised by the publication's report of the successful 
production of arrays containing such a large number (i.e., 119) of different purified 
active protein kinases for which such a high percentage of the arrayed proteins displayed 
kinase activity (i.e., 94% (112/1 19)). Prior to this work, it was generally understood that 
the technical limitations associated with preparing and arraying large numbers of 
proteins on a single array typically led to protein denaturation and conformational 
changes that resulted in a significant loss of protein activity among the arrayed 
proteins.^^ 

25. It was only after having been enlightened by the disclosure, teaching and 
guidance provided in the 781 application that a scientist of ordinary skill in the field of 
proteomics would have been able to routinely produce the claimed arrays containing at 
least 61 purified active protein kinases, or fragments containing a functional protein 
kinase domain. More particularly, as discussed herein, the 781 application discloses 
techniques and methods for recombinantly producing and rapidly purifying and arraying 
high densities of large numbers of active protein kinases, as well as the use of the arrays 



See, e.g., Tleugabulova (Exhibit 20), Groll (Exhibit 21); Bussowl (Exhibit 
22); and Bussow2 (Exhibit 23). 
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prepared according to the teaching of the 781 application in assaying for kinase 
activity.^^ 

26. A scientist of ordinary skill in the field of proteomics, enlightened by the 
disclosure and teaching of the 781 application on May 4, 2001, would have reasonably 
concluded that, as disclosed in the 781 application, the disclosed methods and 
techniques and the exemplified production and use of arrays containing purified active 
yeast protein kinases, could be routinely applied to make and use equally successful 
arrays containing at least 61 purified active protein kinases or fragments containing 
function kinase domains, fi-om a yeast, a Drosophila, or a mammal (including a human), 
without having to undertake inordinate or excessive experimentation. 

27. As I have also previously testified, after reading Zhu (Exhibit 19), I joined 
Protometrix, Inc. (Protometrix), the licensee of Dr. Snyder's protein array technology 
and subsequent to my joining Protometrix, researchers at the company relied on the 
information set forth in the '781 application and the known homologies between human 
and yeast kinases to identify genes encoding kinases and kinase function domains and to 
successfully manufacture human protein arrays on which at least 61 purified protein 
kinases or fi*agments containing kinase domains are active as demonstrated by catalytic 
activity. It is my belief and understanding that the Protometrix researchers succeeded in 
making these arrays by expressing proteins in a baculovirus expression system and by 
relying on, and making at most minor conventional adaptations to the teaching of the 
781 application."^^ Thus, no more than routine adaptation of the teaching of the '781 

See, e.g., 781 application at page 26, line 25, through page 27, line 19; and 
page 28, line 3 to line 22. 

See, e.g., the description of the methods applied by the Protometrix 
researchers provided in Paragraphs 10-13 first Declaration at (Exhibit 6). 
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application was applied by the researchers at Protometrix to identify human protein 

kinases and their functional domains, and make and use positionally addressable arrays 

comprising different proteins on a solid support, with each different protein being at a 

different position on the solid support, wherein the density of the different proteins on 

the solid support was at least 100 different proteins per cm^, and contained at least 61 

purified active human kinases or functional kinase domains, as presently claimed in the 

78 1 application. It is also my opinion that the successful arraying of the active human 

protein kinases by the Protometrix researchers would have reasonably been expected by 

a scientist of ordinary skill in proteomics reading the 781 application on May 4, 2001, in 

view of the reliance of the Protometrix researchers on the teaching of the '781 

application in making and using these arrays. 

28. In the second Office Action, at page 20, the Patent Office quotes the 

following statement made in one of my publications: 

[t]he family of human protein kinases consists of more than 500 members 
of which only a fi-action have been characterized to date. Much is still not 
known about the biological function of many kinases, the protein 
substrates that are phosphorylated by these kinases, or the roles of these 
kinases and substrates in disease. . ..^ 

The Patent Office then comments: 

[t]hus, Schweitzer has not extrapolated or predicted its findings to any 
other family [of] human protein kinases which consist of more than 500 
members to which only a fraction has been characterized to date. 

Although not clearly articulated, it appears that the Patent Office is asserting I did not 
extrapolate or predict that the successful arraying of human protein kinases described in 
my first Declaration could be reasonably extended to arraying other, poorly characterized 



My first Declaration, Exhibit C, at page 2, column 1, paragraph 2 to page 3, 
column 1, paragraph 1 (2004) (Exhibit 6). 
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human protein kinases. In response, I first point out that Patent Office has relied on the 
quoted excerpt out of context. Specifically, I note that the fact much may not be known 
about the biological function, substrates, and roles of protein kinase and their substrates 
in disease, does not support the position that protein kinase family members (and their 
fragments, having a functional kinase domain) would not be expected to display kinase 
activity when arrayed according to the teaching of the '781 application. Moreover, the 
Patent Office's reliance on this argument disregards my previous testimony that the 
Protometrix arrays contain well over 122 active human kinases and functional kinase 
domains.^^ Thus, a high percentage of all the protein kinases in the human kinome are 
arrayed as active kinases on the Protometrix arrays (i.e., despite the fact that "only a 
fraction have been characterized to date"). To directly address the point raised by the 
Patent Office, it is my opinion that a scientist of ordinary skill, having read the 781 
application on May 4, 2001, would have reasonably expected that by applying the 
disclosure and teaching of the '781 application, well characterized, poorly characterized, 
and uncharacterized protein kinases could be routinely and successfully arrayed as active 
kinases. Moreover, on May 4, 2001, a scientist of ordinary skill in the field of 
proteomics would indeed have reasonably expected that by applying the disclosure and 
teaching of the '781 application, at least 61 protein kinases could similarly be routinely 
and successfully arrayed and thus, would likewise display the requisite protein kinase 

■^^ My first Declaration, at Paragraph 13 (Exhibit 6). As an addendum to and in 
clarification of, my previous testimony, I note that while to the best of my knowledge 
Protometrix researchers have not formally rigorously proven that all 400 of the human 
kinases and functional kinase domains arrayed on Invitrogen's Human ProtoArray High 
Density Protein Microarrays™ are active, it is my belief and understanding based on 
autophosphorylation studies that a high percentage of these arrayed proteins are active 
(i.e., well over the 122 purified active human kinases or human functional kinase 
domains recited in pending claim 186 (Exhibit 3)). 
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activity, irrespective of whether the kinases are: (a) known or yet to be discovered; (b) 
well characterized, poorly characterized or uncharacterized; (c) full-length or a fragment 
containing a kinase domain, or (d) from a yeast, a Drosophila^ or a human or another 
marmnal. 

Summary 

29. The *781 application contains extensive disclosure and guidance relating 
to the manufacture and use of arrays that fall within the scope of the pending claims."*^ 
In particular. Example 1 discloses methods and techniques that successfully produced 17 
arrays containing 111 purified distinct and diverse active protein kinases corresponding 
to almost every protein kinase in the yeast kinome, of which a high percentage of the 
arrayed proteins displayed kinase activity. 

30. For at least the reasons presented in this declaration, it is my opinion that 
in view of the teaching and guidance of the 781 application and the extensive knowledge 
in the field of proteomics on May 4, 2001, which included an art-recognized correlation 
between the sequence (i.e., structure) and kinase activity (i.e., fiinction) of kinase 
domains, a scientist of ordinary skill in the field of proteomics, enlightened by the 
disclosure of the 781 application would have reasonably concluded that: 

(a) protein kinases from different organisms were readily and reliably 
recognized and distinguished based on the primary amino acid sequences of their 
catalytic kinase domain and proteins containing these kinase domains were expected to 
display kinase activity; 



Pending claims (Exhibit 3). 
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(b) characterized and iincharacterized or unidentified protein kinases (and 
fragments containing a functional kinase domain of these proteins) from a yeast or other 
organism, such as a Drosophila, or a mammal (including a human) could be readily 
recognized, distinguished, and arrayed in accordance with the disclosure and teaching of 
the 781 application; and 

(c) the '781 application contains extensive disclosure and guidance that 
provides ample description of methods for reliably and predictably making and using 
protein arrays falling w^ithin the scope of claim 1 and that contain at least 61 purified 
active protein kinases (including fragments containing a functional kinase domain) from 
a yeast or another organism, such as a Drosophila or a mammal (including a human), and 
the arrays containing these purified active kinases. 

31. Accordingly, for at least the above reasons, it is my opinion that a 
scientist of ordinary skill in the field of proteomics on May 4, 2001, reading the '781 
application, would have reasonably concluded that the application amply describes the 
claimed arrays which contain at least 61 purified active kinases (and fragments 
containing a functional kinase domain) of a yeast, a Drosophila or a mammal (in 
mammal). 

32. Additionally, for at least the reasons presented herein, it is my opinion 
that in view of the extensive knowledge in the field of proteomics and the art-recognized 
correlation between the sequence and kinase activity of kinase domains on May 4, 2001, 
a scientist of ordinary skill in the field of proteomics, enlightened by the teaching 
disclosure and guidance of the 781 application would have reasonably concluded that: 

(a) protein kinases (and fragments containing a functional kinase domain) 
from different organisms were and could be routinely, and predictably recognized and 
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distinguished from other proteins based on the sequences of their catalytic kinase domain 
and proteins containing these kinase domains would be expected to display kinase 
activity that could readily be assayed using methods disclosed in the 78 1 application or 
otherwise known in the art; and 

(b) the extensive disclosxire, teaching, and guidance of the '781 application, 
which includes methods and reagents used to successfully array purified active protein 
kinases corresponding to more than 90% of the yeast kinome, could be applied to 
routinely and predictably produce arrays encompassed by the pending claims and that in 
particular, contain at least 61 purified active protein kinases (including fragments 
containing a functional kinase domain) from a yeast and other organisms, such as a 
Drosophila or a mammal (including a human). 

33. For at least the above reasons, a scientist of ordinary skill in the art of 
proteomics, enlightened by the extensive guidance and teaching of the '781 application, 
would have reasonably concluded that the methods and techniques disclosed in the 
application could be routinely applied (as indeed they were) to make and use equally 
successful arrays falling within the scope of the pending claims (Exhibit 3) and that 
contain at least 61 purified active protein kinases or fragments containing function kinase 
domains, from a yeast, a Drosophila, or a mammal (including a hximan), without having 
to undertake more than routine experimentation. 

34. I hereby declare that all statements made herein of my own knowledge are 
true and that all statements made on information and belief are believed to be true; and 
further that these statements were made with the knowledge that willful false statements 
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and the like so made are punishable by fine or imprisonment, or both, under Section 
1001 of Title 18 of the United States Code and that such willful false statements may 
jeopardize the validity of the present patent application or any patent issued thereon. 

Respectfully submitted. 



Bam Schweitzer, Ph.D. 



Date: 
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PROFESSIONAL EXPERIENCE 

INVITROGEN CORPORATION (Now LIFE TECHNOLOGIES), Carlsbad, CA 2004 - Present 

Director, Integrated Technologies, Molecular Biology Systems Division -2009 to present 
Director, Protein Analysis R*&D - 2008 

Director, Protein Array R&D and Site Leader - 2006 - 2007 
Director, Protein Array K&D and Operations - 2004 - 2006 

Current responsibilities include the oversight of programs which span the traditional segments of Uie Molecular Biology Reagent 
Business, particularly programs that integrate insti-umentation with consumables. Previous responsibilities included oversight of 
R&D and Services for Invitrogen's Protein Analysis product lines, including protein separation technologies, Western technologies, 
mass specu'oscopy, and protein arrays. Additional responsibilities included site leadership of the Protein Array Center in Branford, 
CT, including R&D, Services, Manufacturing, Quality, and Facilities functions. Other responsibilities include budget preparation and 
implementation, intellectual property management, and oversight of academic, government, and industrial collaborations and 
contracts. Also participating in technology and intellectual property evaluations, business development, grant preparations, 
community relations, and presentations at national and intemational meetings. Reporting to the Vice President, R&D of the 
Molecular Biology Reagents Business Unit. 

Leadership accomplishments include: 

Led transfer of all operations from Branford, CT to Carlsbad on-time, under budget, and without loss of revenue 
Led global launch of several new multimillion dollar products 
Championed Lean Six Sigma Black Belt and Green BeU projects 
Led ISO 9001 Certification of Branford Site 

Led successful completion of multimillion dollar Biodefense projects in parmership with the United States Army 
Medical Research Institute for Infectious Diseases (USAMRIID) 
Authored or co-authored 1 1 publications, including paper in Nature 
Inventor or co-inventor on 10 new patent applications 
Presented at 14 international scientific conferences. 

PROTOMETRIX, INC., Branford, CT 2002 - 2004 

Senior Director, Technology - 2003-2004 
Director of Technology - 2002 - 2003 

Fifth person to join start-up biotechnology company. Director of a research and development operation providing high-throughput 
gene cloning, protein expression, protein purification, and protein microarray manufacturing for products, services, and discovery. 
Additional responsibilities included leading product development teams, leading technology and intellectual property diligence 
reviews, presenting to investors, coordinating industrial collaborations, and managing prosecution of company intellectual property. 
Reported to the Vice President, R&D. 

Leadership accomplishments included: 

Led the Protometrix technical and IP diligence team during the acquisition of the company by Invitrogen Corp. 
Led the commercial launch of the world's first functional protein microarray product. 
Established the 1st manufacturing facility for the production of protein arrays. 
Built highly skilled team of scientists, engineers, and informatics specialists 
Led the design and buildout of 14,000 s.f state-of-the-art laboratory and company headquarters. 
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MOLECULAR STAGING, INC., New Haven, CT ^^^^ " ^002 

Director of Proteoniics - 2001 - 2002 

Section Head - 1998-2000 ^. , , . , 

Second person to join start-up biotechnology company. Diiector of a research and sei-vice operation providing hagh-througiiput 
protein expression profihng data using proprietary protein microarray technology to academic, government, and corporate clients. 
Responsibilities included management of research personnel, budget preparation and implementation, business development, 
oversight of academic collaborators, preparation of publications and patent applications, presentations for investors, corporate 
partners and at national meetings. Four direct and 23 indirect reports. Reporting to Chief Operating Officer. 

Leadership accomplishments included: 

■ Successfully launched the world*s first microarray-based protein expression profiling service. 
Developed the world's most advanced manufacturing facility for production of antibody microarrays. 

■ 8 publications, including publication in Nature Biotechnology of 1st application of antibody microarrays for protein 
expression profiling. 

» 1 issued patent, and 4 patent applications. 

■ Led and coordinated the design and buildout of 46,000 s.f of state-of-the-art proteomics laboratory. 

■ Successfully moved an academic technology into an industiia] setting, increasing sensitivity, robustness, and utiUty. 
Led project resulting in $9 MM equity investment by Foitune 100 Company. 

■ Gave technical presentations resulting in $40 MM 2nd round financing. 

WALT DISNEY MEMORIAL CANCER CENTER, Orlando, FL 1 994 - 1998 

Division Director. Laboratory director of multidisciplinary research program in the structural biology of nucleic acids, proteins, and 
drugs involved in cancer and related diseases. Responsibilities included carrying out experiments and data analysis, project 
development, management of 15-20 research, administrative, and volunteer personnel, budget preparation and implementation, grant 
writing, preparation of publications, public relations, and mentoring of graduate, undergraduate and high school students. 
Scientific Director Molecular Diagnostics Clinical Laboratory. Responsibilities included business plan preparation and 
implementation, management of technical staff, technical consultant, chnical research director, and physician outreach. 

Leadership accomplishments included: 

■ Established and directed a program utilizing multidimensional nuclear magnetic resonance (NMR) spectroscopy, and 
computational chemistry to determine high-resolution stmctures of proteins, nucleic acids, and drug complexes for the 
puipose of chemotherapeutic development. 

■ Established and directed a laboratoiy utilizing the most advanced molecular techniques to diagnose uifectious diseases, 
cancer, and inherited diseases for patients of Florida Hospital (2nd largest number of adnaissions in U.S.). 

UNIVERSITY OF CENTRAL FLORIDA, Orlando, FL 1^94 - 1998 

Assistant Professor 

Responsibilities included: Research, Florida Hospital liaison, committee service, mentoring of graduate and undergraduate students, 
taught courses in Principals of Modem NMR Spectroscopy, Special Topics in Drug Development. Advanced Biochemistiy 
Laboratory 

EARLIER POSITIONS: Associate Research Scientist (1991-1993), Yale University School of Medicine, and Research 
Associate ( 1 990- 1 99 1 ), Memorial Sloan-Kettering Cancer Center 



OTHER EXPERIENCE 

GLYGENIX, INC., Cheshire, CT 2005 - 2007 

Member; Boar d"orDirea'6^^^^^^ was cs'tablisKed to benefit chjldfenbm- with Giyc Disease, Type i 

(GSDl.) Its goal is to help find a cure for this disease by raising monies for GSDl -related research. 

THE EPISCOPAL CHURCH AT YALE, New Haven, CT 2000 - 2003 

Member, Board of Governors. The Episcopal Church at Yale (ECY) is a full time ministry of the Episcopal Church to students, 
staff and faculty at Yale.The ECY is governed by a Board of Governors of the Episcopal Church at Yale Corporation which is the 
legal entity of the Corporation in matters of contracts and other transactions with other institutions such as Yale University. 
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PUBLICATIONS 

Original Articles, 

1. Schweitzer, B. I., and Bacopoulos, N. G. (1983) Reversible decrease in^ dopaminergic 3H-agonist binding after 6- 
hydroxydopamine and irreversible decrease after kainic acid. Life Sciences 32, 531-541. 

2. Merker. M., Rice, J., Schweitzer, B., and Handschuniacher, R. E. (1983) Cyclosporine binding component in BW5 187 
lymphoblasts and normal lymphoid tissue. Transplantation Proceedings 15, 2265-2270. 

3. DeReimer. S. A., Schweitzer, B., and Kaczmarek, L. K. (1985) Inhibitors of calcium-dependent enzymes prevent the 
onset of afterdischarge in the peptidergic bag cell neurons of Aplysia. Brain Research 340, 175-180. 

4 Srimatkandada, S., Schweitzer, B. L, Moroson, B. A., Dube, S., and Bertino, J. R. (1989) Amplification of a 
polymorphic dihydrofolate reductase gene expressing an enzyme with decreased bmding to methotrexate m a human 
colon carcinoma cell line, HCT8R4. resistant to this drug. Journal of Biological Chemistry 264, 3524-3528. 

5 Schweitzer, B. I., Srimatkandada, S,, Gritsman, H., Sheridan, R., Venkataraghavan, R., and Bertino, J. 0989) 
Probing the role of two hydrophobic residues in the active site of the human dihydrofolate reductase by site-directed 
mutagenesis. Journal of Biological Chemistry 264, 20786-20795. 

6 Dicker, A. P., Volkenandt, M., Schweitzer, B. I., Banerjee. D., and Beiiino, J. R. (1990) Identification and 
characterization of a mutation in the dihydrofolate reductase gene from the methotrexate resistant Chinese hamster 
ovaiy cell line, Pro-3 MTXRIIl. Journal of Biological Chemistry 265 , 8317-8321. 

7. Li, W. W., Lin, J. T., Schweitzer, B. I., and Bertino, J. R. (1991) Mechanisms of sensitivity and natural resistance to 
antifolates'in a methylcholanthiene-induced rat sarcoma. Molecular Pharmacology 40, 854-858. 

8. Li, W. W., Lin, J. T., Chang, Y. M., Schweitzer, B. I., and Bertino. J. R. (1991) Prediction of antifolate efficacy in a rat 
sarcoma model. International Journal of Cancer 49, 234-238. 

9 Lin, J T., Tong, W. P., Trippett, T. M., Niedzwiecki, D., Tao, Y., Tan, C. Steinherz, P., Schweitzer, B. I., and Bertino, 
J. R. (1991) Basis for natural resistance to methotrexate in human acute non-lymphocytic leukemia. Leukemia 
Research 15, 1191-1196. 

10 Sapse A-M , Schweitzer, B. I., Dicker, A, P., Bertino, J. R., and Precer, V. (1992) Ab initio studies of aromatic- 
aromatic and aromatic-polar interactions in the binding of substrate and inhibitor to dihydrofolate reductase. 
International Journal of Peptide and Protein Research 33 9, 1 8-23 . 

11. Trippett, T., Schlemmer, S., Elisseyeff. Y., Wachter, M., Steinherz, P., Berman, E., Rosowsky, A., Schweitzer, B., and 
Bertino. J. R. (1992) Evidence for defective transport as a mechanism of acquired resistance to MTX in patients with 
acute lymphocytic levikemia. Blood 80, 1 158-1 162. 

12. Li, W. W„ Lin, J. T,, Schweitzer, B. I., Tong, W, P., Niedzwiecki, D., Brennan. M. P., and Bertino, J. R. (1992) 
Intrinsic resistance to methotrexate in human soft tissue sarcoma cell lines. Cancer Research 52, 1-6. 

13 Li M -X., Hantzopoulos, P. A., Banerjee. D., Zhao, S. C, Schweitzer, B. L, Gilboa, E., and Bertino, J. R. (1992) 
Comparison of the expression of a mutant dihydrofolate reductase under control of different internal promoters in 
retroviral vectors. Human Gene Therapy 3, 381-390. 

14 Volkenandt, M., Dicker, A. P.. Banerjee, D.. Fanin, R., Schweitzer, B. L, Horikoshi, T., Danenberg, K. Danenberg P., 
and Bertino, J. R. (1992) Quantitation of gene copy number and mRNA using the polymerase ch^in reaction. 
Proceedings of the Society for Experimental Biology and Medicine 200, 1 -6. 

15 Fanin, R., Banerjee. D., Volkenandt, M.. Waltham, M., Li, W. W., Dicker, A. P., Schweitzer, B. I., and Bertino, J. R. 
(1993) Mutations leading to antifolate resistance in Chinese hamster ovary cells after exposure to the alkylating agent 
ethylmethanesulfonate. Molecular Pharmacology 13-21, 

16. Goker, E., Lin, J. T., Trippet, T., Elisseyeff, Y., Tong, W. P., Niedzwiecki, D., Tan, C, Steinherz, P.. Schweitzer, B. \ 
and Bertino, J. R. (1993) Decreased polyglutamylation of methotrexate in acute lymphoblastic leukemia blasts in adults 
compared to children with this disease. Leukemia 7, 1000-1004. 

17. Li, W. W., Waltham, M., Tong, W., Schweitzer. B. L, and Bertino, J. R. (1993) Increased activity of gamma-glutamyi 
hydrolase in human sarcoma cell lines: a novel mechanisni of intrinsic resistance to methotrexate (MTX). Advances in 
Experimental Medicine and Biology 338, 635-638. 

18. Kellogg, G. W., and Schweitzer, B. I. (1993) Two- and three-dimensional 31P-driven NMR procedures for complete 
assignment of backbone resonances in oligodeoxyribonucleotides. Journal of Biomolecular NMR 3, 577-595. 

19. Dicker, A. P., Waltham, M., Volkenandt, M., Schweitzer, B. L, Otter, G. M., Schmid. F. A.. Sirotnak, F. M.. and 
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Bertino, J. R. (1993) Methotrexate resistance in an in vivo mouse tumor resulting from a novel mutation in the 
dihydrofolate reductase gene. Proceedings of the National Academy of Sciences (USA) 90, 1 1797- 11 801. 

20. Ercikan, E., Waltham, M., Dicker, A., Schweitzer, B., and Bertino, J- R- (1993) Effect of codon 22 mutations on 
substrate and inhibitor binding for human dihydrofolate reductase. Advances in Experimental Medicine and Biology 
338, 515-519. 

21. Banerjee, D., Schweitzer, B. I.. Volkenandt, M., Li, M.-X., Waltham, M., Mineishi, S., Zhao, S.-C, and Bertino, J. R. 
(1994) Transfection with a cDNA encoding a Ser31 or Ser34 mutant human dihydrofolate reductase into Chinese 
hamster ovary and mouse marrow progenitor cells confers methotrexate resistance. Gene 139, 269-274. 

22. Li, M.-X.. Banerjee, D., Zhao, S.-C, Schweitzer, B. L. Mineishi, S., Gilboa, E., and Bertino, J. R. (1994) Development 
of a retroviral construct containing a human mutated dihydrofolate reductase cDNA for hematopoietic stem cell 
transduction. Blood S3, 3403-3408. 

23. Zhao, S.-C, Li. M.-X., Banerjee, D., Schweitzer, B. I., Mineishi, S., Gilboa, E., and Beitino, J. R. (1994) Long-term 
protection of recipient mice from lethal doses of methotrexate by marrow infected with a double-copy vector retrovirus 
containing a mutant dihydrofolate reductase. Cancer Gene Therapy 1, 33-39. 

24. Schweitzer, B. I., Mikita, T., Kellogg, G. W., Gardner, K. H., and Beardsley, G. P. (1994) Solution structure of a DNA 
dodecamer containing the antineoplastic agent cytosine arabinoside: Determination by two- and three-dimensional 
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5 This application claims the benefit under 35 U.S.C» § 1 19 (e) of U.S. provisional 

patent application Serial No. 60/201,921, filed on May 4, 2000, and U.S. provisional patent 
application Serial No. 60/221,034, filed on July 27, 2000, each of which is incorporated 
herein, by reference, in its entirety. 
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The government has certain rights in the invention. 



p L Field of the Invention 

!l 

^2 1 5 The present invention relates to protein chips useful for the large-scale study of 

protein function where the chip contains densely packed reaction wells. The invention 
relates to methods of using protein chips to assay simultaneously the presence, amount, 

H and/or function of proteins present in a protein sample or on one protein chip, or to assay 

the presence, relative specificity, and binding affinity of each probe in a mixture of probes 

IJ 

in 20 for each of the proteins on the chip. The invention also relates to methods of using the 
«5 protein chips for high density and small volume chemical reactions. Also, the mvention 

□ relates to polymers useful as protein chip substrates and methods of making protein chips. 

The invention further relates to compounds useful for the derivatization of protein chip 

substrates. 

25 

II. Background of the Invention 



The sequencing of entire genomes has resulted in the identification of large numbers 
of open reading firames (ORFs). Currently, significant effort is devoted to understanding 
30 gene function by mRNA expression patterns and by gene disruption phenotypes. Important 
advances in this effort have been possible, in part, by the ability to analyze thousands of 
gene sequences in a single experiment using gene chip technology. However, much 
information about gene function comes firom the analysis of the biochemical activities of the 
encoded protein. 

35 Currently, these types of analyses are performed by individual investigators studying 

a single protein at a time. This is a very time-consuming process since it can take years to 
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purify and identify a protein based on its biochemical activity. The availability of an entire 
genome sequence makes it possible to perform biochemical assays on every protein 
encoded by the genome. 

To this end, it would be useful to analyze hundreds or thousands of protein samples 

5 using a single protein chip. Such approaches lend themselves well to high throughput 
experiments in which large amounts of data can be generated and analyzed. Microtiter 
plates containing 96 or 384 wells have been known in the field for many years. However, 
the size (at least 12.8 cm x 8.6 cm) of these plates makes them imsuitable for the large-scale 
analysis of proteins because the density of wells is not high enough. 

1 0 As noted above, other types of arrays have been devised for use in DNA synthesis 

and hybridization reactions, e.^., as described in WO 89/10977. However, these arrays are 
unsuitable for protein analysis in discrete volumes because the arrays are constructed on flat 
surfaces which tend to become cross-contaminated between features. 

Photolithographic techniques have been applied to making a variety of arrays, from 

15 oligonucleotide arrays on flat surfaces (Pease et aL, 1994, "Light-generated oligonucleotide 
arrays for rapid DNA sequence analysis," PNAS 91:5022-5026) to arrays of channels (U.S. 
Patent No. 5,843,767) to arrays of wells connected by channels (Cohen et al, 1999, "A 
microchip-based enzyme assay for protein kinase A," Anal Biochem. 273:89-97). 
Furthermore, microfabrication and microlithography techniques are well known in the 

20 semiconductor fabrication area. See, e.g., Moreau, Semiconductor Lithographv: Principals, 
Practices and Materials. Plenum Press, 1988. 

Recently devised methods for expressing large numbers of proteins with potential 
utility for biochemical genomics in the budding yeast Saccharomyces cerevisiae have been 
developed. ORFs have been cloned into an expression vector that uses the GAL promoter 

25 and fuses the protein to a polyhistidine (e.g^., HISX6) label. This method has thus far been 
used to prepare and confirm expression of about 2000 yeast protein fiisions (Heyman et aL^ , 
1999, "Genome-scale cloning and expression of individual open reading frames using 
topoisomerase I-mediated ligation," Genome Res. 9:383-392). Using a recombination 
strategy, about 85% of the yeast ORFs have been cloned in frame with a GST coding region 

30 in a vector that contains the CUPl promoter (inducible by copper), thus producing GST 
fusion proteins (Martzen et al.^ 1999, "A biochemical genomics approach for identifying 
genes by the activity of their products," Science 286: 1153-1155). Martzen et al used a 
pooling strategy to screen the collection effusion proteins for several biochemical activities 
{^ g^p phosphodiesterase and Appr-l-P-processing activities) and identified the relevant 

35 genes encoding these activities. However, strategies to analyze large numbers of individual 
protein samples have not been described. 
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Thus, the need exists for a protein chip in which the wells are densely packed on the 
chip so as to gain cost and time advantage over the prior art chips and methods. 

Citation or identification of any reference in Section n or any other section of this 
application shall not be considered as admission that such reference is available as prior art 
5 to the present invention. 

III. Summary of the Invention 

The invention is directed to protein chips, /.e., positionally addressable arrays of 
10 proteins on a solid support, useful for the large-scale study of protein function wherein the 

protein chip contains densely packed reaction wells. The invention is also directed to 

methods of using protein chips to assay the presence, amount, and/or functionality of 

proteins present in at least one sample. The invention also is directed to methods of using 

the protein chips for high density and small volume chemical reactions. Also, the invention 
15 is directed to polymers useful as protein chip substrates and methods of making protein 

chips. The invention is directed to compounds useful for the derivatization of protein chips. 
In one embodiment, the present invention provides a protein chip comprising a flat 

surface, such as, but not limited to, glass sHdes. Dense protein arrays can be produced on, 

for example, glass slides, such that chemical reactions and assays can be conducted, thus 
20 allowing large-scale parallel analysis of the presence, amovmt, and/or functionality of 

proteins. In a specific embodiment, the flat stirface array has proteins bound to its surface 

via a 3-glycidooxypropyltrimethoxysilane (GPTS) linker. 

Furthermore, in another specific embodiment, the present invention overcomes the 

disadvantages and limitations of the methods and apparatus knovra in the art by providing 
25 protein chips with densely packed wells in which chemical reactions and assays can be 

conducted, thus allowing large-scale parallel analysis of the presence, amount, and/or 

functionality of proteins. 

The general advantages of assaying arrays rather than one-by-one assays include the 

ability to simultaneously identify many protein-probe interactions, and to determine the 
30 relative afifmity of these interactions. The advantages of applying complex mixtures of 

probes to a chip include the ability to detect interactions in a milieu more representative of 

that in a cell, and the ability to simuUaneously evaluate many potential ligands. 

In one embodiment, the invention is a positionally addressable array comprising a 

plurality of different substances, selected from the group consisting of proteins, molecules 
35 comprising fimctional domains of said proteins, whole cells, and protein-containing cellular 

material, on a solid support, with each different substance being at a different position on 
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the solid support, wherein the pluraHty of substances consists of at least 100 different 
substances per cm^. 

In another embodiment, the invention is a positionally addressable array comprising 
a plurality of different proteins, or molecules comprising functional domains of said 

5 proteins, on a solid support, with each different protein or molecule being at a different 
position on the solid support, wherein the plurality of different proteins or molecules 
consists of at least 50% of all expressed proteins with the same type of biological activity in 
the genome of an organism. 

In yet another embodiment, the invention is a positionally addressable array 

10 comprising a plurality of different substances, selected from the group consisting of 
proteins, molecules comprising functional domains of said proteins, whole cells, and 
protein-containing cellular material, on a solid support, with each different substance being 
at a different position on the solid support, wherein the solid support is selected from the 
group consisting of ceramics, amorphous silicon carbide, castable oxides, polyimides, 

15 polymethyhnethacrylates, polystyrenes and silicone elastomers. 

In still another embodiment, the invention is a positionally addressable array 
comprising a plurality of different substances, selected from the group consisting of 
proteins, molecules comprising functional domains of said proteins, whole cells, and 
protein-containing cellular material, on a solid support, with each different substance being 

20 at a different position on the solid support, wherein the plurality of different substances are 
attached to the solid support via a 3-glycidooxypropyltrimethoxysilane linker. 

In another embodiment, the invention is an array comprising a plurality of wells on 
the surface of a solid support wherein the density of the wells is at least 100 wells/cm^. 

The present invention also relates to a method of making a positionally addressable 

25 array comprising a plurality of wells on the surface of a solid support comprising the step of 
casting an array from a microfabricated mold designed to produce a density of greater than 
100 wells/cm^ on a solid siuface. In another embodiment, the invention is a method of 
making a positionally addressable array comprising a plurality of wells on the surface of a 
solid support comprising the steps of casting a secondary mold from a microfabricated mold 

30 designed to produce a density of wells on a solid surface of greater than 100 wells/cm^ and 
casting at least one array from the secondary mold. 

In yet another embodiment, the invention is a method of using a positionally 
addressable array comprising a plurality of different substances, selected from the group 
consisting of proteins, molecules comprising functional domains of said proteins, whole 

35 cells, and protein-containing cellular material, on a solid support, with each different 

substance being at a different position on the solid support, wherein the plurality of different 
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substances consists of at least 100 different substances per cm^, comprising the steps of 
contacting a probe with the array, and detecting protein/probe interaction. 

In still another embodiment, the invention is a method of using a positionally 
addressable array comprising a plurality of different proteins, or molecules comprising 

5 functional domains of said proteins, on a solid support, with each different protein or 

molecule being at a different position on the solid support, wherein the plurality of proteins 
and molecules consists of at least 50% of all expressed proteins with the same type of 
biological activity in the genome of an organism, comprising the steps of contacting a probe 
with the array, and detecting protein/probe interaction. 

10 In another embodiment, the invention is a method of using a positionally 

addressable array comprising a plurality of different substances, selected from the group 
consisting of proteins, molecules comprising functional domains of said proteins, whole 
cells, and protein-containing cellular material, on a solid support, with each different 
substance being at a different position on the solid support, wherein the solid support is 

15 selected from the group consisting of ceramics, amorphous silicon carbide, castable oxides, 
polyimides, polymethylmethacrylates, polystyrenes and silicone elastomers, comprising the 
steps of contacting a probe with the array, and detecting protein/probe interaction. 

In yet another embodiment, the invention is a method of using a positionally 
addressable array comprising a plurality of different substances, selected from the group 

20 consisting of proteins, molecules comprising functional domains of said proteins, whole 
cells, and protein-containing cellular material, on a soUd support, with each different 
substance being at a different position on the solid support, wherein the plurality of different 
substances are attached to the solid support via a 3-glycidooxypropyltrimethoxysilane 
linker, comprising the steps of contacting a probe with the array, and detecting 

25 protein/probe interaction. 

In still another embodiment, the invention is a method of using a positionally 
addressable array comprising the steps of depositing a plurality of different substances, 
selected from the group consisting of proteins, molecules comprising functional domains of 
said proteins, whole cells, and protein-containing cellular material, on a solid support, with 

30 each different substance being at a different position on the solid support, wherein the 
plurality of different substances consists of at least 100 different substances per cm^, 
contacting a probe with the array, and detecting protein/probe interaction. 

In a specific embodiment, the invention is a method of using a positionally 
addressable array comprising the steps of depositing a plurality of different substances, 

35 selected from the group consisting of proteins, molecules comprising functional domains of 
said proteins, whole cells, and protein-containing cellular material, on a solid support, with 
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each different substance being at a different position on the solid support, wherein the 
plurality of different substances consists of at least 100 different substances per c^l^ and 
wherein the solid support is a glass slide, contacting a probe with the array, and detecting 
protein/probe interaction. 

5 In another embodiment, the invention is a method of using a positionally 

addressable array comprising the steps of depositing a plurality of different proteins, or 
molecules comprising functional domains of said proteins, on a solid support, with each 
different protein or molecule being at a different position on the solid support, wherein the 
plurality of different proteins or molecules consists of at least 50% of all expressed proteins 

10 with the same type of biological activity in the genome of an organism, contacting a probe 
with the array, and detecting protein/probe interaction. 

In another embodiment, the invention is a method of using a positionally 
addressable array comprising the steps of depositing a plurality of different proteins, or 
molecules comprising functional domains of said proteins, on a solid support, with each 

15 different protein or molecule being at a different position on the solid support, wherein the 
plurality of different proteins or molecules consists of at least 50% of all expressed proteins 
with the same type of biological activity in the genome of an organism, and wherein the 
solid support is a glass slide, contacting a probe with the array, and detecting protein/probe 
interaction. 

20 In another embodiment, the invention is a method of making a positionally 

addressable array comprising the steps of casting an array from a microfabricated mold 
designed to produce a density of wells on a solid surface of greater than 100 wells/cm^ and 
depositing in the wells a plurality of different substances, selected from the group consisting 
of proteins, molecules comprising functional domains of said proteins, whole cells, and 

25 protein-containing cellular material, on a solid support, with each different substances being 
in a different well on the solid support. 

In another embodiment, the invention is a method of making a positionally 
addressable array comprising the steps of casting a secondary mold from a microfabricated 
mold designed to produce a density of wells on a solid surface of greater than 100 

30 wells/cm^, casting at least one array from the secondary mold, and depositing in the wells a 
plurality of different substances, selected from the group consisting of proteins, molecules 
comprising functional domains of said proteins, whole cells, and protein-containing cellular 
material, not attached to a solid support, with each different substances being in a different 
well. 

35 In yet another embodiment, the invention is a method of making a positionally 

addressable array comprising the steps of casting a secondary mold from a microfabricated 



-6- 



NY2 - 1 102584.9 



mold designed to produce a density of wells on a solid surface of greater than 100 
wells/cm^, casting at least one array from the secondary mold, and depositing in the wells a 
plurality of different substances, selected from the group consisting of proteins, molecules 
comprising functional domains of said proteins, whole cells, and protein-containing cellular 
5 material, with each different substance being in a different well. 

A. Definitions 

10 As used in this appUcation, '^protein" refers to a full-length protein, portion of a 

protein, or peptide. Proteins can be prepared from recombinant overexpression in an 
organism, preferably bacteria, yeast, insect cells or mammalian cells, or produced via 
fragmentation of larger proteins, or chemically synthesized. 

As used in this application, "functional domain" is a domain of a protein which is 

15 necessary and sufficient to give a desired functional activity. Examples of functional 
domains include, inter alia, domains which exhibit kinase, protease, phosphatase, 
glycosidase, acetylase, transferase, or other enzymatic activity. Other examples of 
functional domains include those domains which exhibit binding activity towards DNA, 
RNA, protein, hormone, ligand or antigen. 

20 As used in this application, "probe" refers to any chemical reagent which binds to a 

nucleic acid (e^., DNA or RNA) or protein. Examples of probes include, inter alia, other 
proteins, peptides, oligonucleotides, polynucleotides, DNA, RNA, small molecule 
substrates and inhibitors, drug candidates, receptors, antigens, hormones, steroids, 
phospholipids, antibodies, cofactors, cytokines, glutathione, immvmoglobulin domains, 

25 carbohydrates, maltose, nickel, dihydrotrypsin, and biotin. 

Each protein or probe on a chip is preferably located at a known, predetermined 
position on the solid support such that the identity of each protein or probe can be 
determined from its position on the solid support. Further, the proteins and probes form a 
positionally addressable array on a solid support. 

30 

IV. Brief Description of the Drawings 

Figure la. Using the depicted recombination strategy, 119 yeast protein kinases 
were cloned in a high copy URA3 expression vector (pEGKG) that produces GST fusion 
35 proteins under the control ofthe galactose-inducible Gi4L/0 promoter. GST::kinase 
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constructs were rescued into E, coli^ and sequences at the 5'-end of each construct were 
determined. The whole procedure was repeated when mutations were discovered. 

Figure lb. Immimoblots of GST::kinase fusion proteins purified as described. 
S From three attempts, 106 kinase proteins were purified. In spite of repeated attempts, the 
last 14 of 1 19 GST fusions were undetectable by immunoblotting analysis, {e.g., Mpsl in 
the lane labeled with star). 

Figure 2a. The protein chips used in the kinase study were produced according to 
10 the following process, schematically depicted. The polydimethylsiloxane (PDMS) was 
poxired over an acrylic master mold. After ciuing, the chip containing the wells was peeled 
away and mounted on a glass slide. Next, the surface of the chip was derivatized and 
proteins were then attached to the wells. Wells were first blocked with 1% BSA, after 
which kinase, ^^P-y-ATP, and buffer were added. After incubation for 30 minutes at 30**C, 
15 the protein chips were washed extensively, and exposed to both X-ray film and a Molecular 
Dynamics Phosphorlmager, which has a resolution of 50 \im and is quantitative. For twelve 
substrates, each kinase assay was repeated at least twice; for the remaining five substrates, 
the assays were performed once. 

20 Figure 2b. An enlarged picture of a protein chip. 

Figure 3. Protein chip and kinase assay results. Position 19 on every chip indicates 
the signal of negative control. Mpsl at position B4 showed strong kinase activities in all 12 
kinase reactions, although no visible signal could be detected on a western blot (Figure lb). 

25 

Figure 4a. Quantitative analysis of protein kinase reactions. Kinase activities were 
determined using a Molecular Dynamics Phosphorlmager, and the data were exported into 
an Excel spreadsheet. The kinase signals were then transformed into fold increases by 
normalizing the data against negative control. Signals of 1 19 kinases in four reactions are 
30 shown in log scale. The fold increases ranges from 1 to 1000 fold. 

Figure 4b. To determine substrate specificity, specificity index (SI) was calculated 

using the following formula: SI,^ = F,^ / [(F„ + F,^ + + ) / r], where / represents the 

identity of the kinase used, r represents the identity of the substrate, and F,y represents the 
35 fold increase of a kinase / on substrate r compared with GST alone. Several examples of 
kinase specificity are shown when SI is greater than three. 
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Figure Sa. Phylogenetic tree derived from the kinase core domain multiple 
sequence alignment, illustrating the correlation between functional specificity and amino 
sequences of the poly(Tyr-Glu) kinases. Kinases that can use poly(Thr-Glu) as a substrate 
often map to specific regions on a sequence comparison dendrogram. The kinases that 
5 efficiently phosphorylate poly(Tyr-Glu) are indicated by shading; two kinases that weakly 
use this substrate are indicated in boxes. Rad53 and Ste7, which could not phosphorylate 
poly(Tyr-Glu), are indicated by asterisks. As shown, 70% of these kinases lie in four 
sequence groups (circled). 

10 Figure 5b. Structure of the rabbit muscle phosphorylase kinase (PHK)28. The 

positions of three basic residues and a methionine (Met) residue, which are preferentially 
found in kinases that can use poly(Tyr-Glu) as a substrate, are indicated. The asparagine 
(Asp) residue is usually found in kinases that do not use poly(Tyr-Glu). 

1 5 Figure 6. Cross sectional views of lithographic steps in a process of making protein 

chips. 

a. A silicon wafer with two layers of silicon on either side of an oxide layer. 

b. The silicon wafer with a resistant mask layer on top. 

c. The etching process removes silicon where the sxuface is unprotected by the 

20 resistant mask. The depth of the etching is controlled by the position of the oxide layer, i.e., 
the etching process does not remove the oxide layer. 

d. The mask layer is removed, leaving the etched silicon wafer. 

e. The protein chip material is applied to the mold. 

£ After curing, the protein chip is removed from the mold. The protein chip has an 
25 image that is the negative of the mold. 

Figure 7, Kinase/inhibitor assays on a protein chip. A human protein kinase A 
(PKA), a human map kinase (MAPK), three yeast PKA homologs (TPKl, TPK2 and 
TPK3), and two other yeast protein kinases (HSLl and RCKl) were tested against two 

30 substrates (i.e., a protein substrate for PKA and a conunonly used kinase substrate, MBP) 
using different concentrations of a specific human PKA inhibitor, PKIa, or a MAPK 
inhibitor, SB202190. As shown in the figure, PKIa can specifically inhibit PKA activities 
using both peptide and MBP as substrates. However, SB202190 did not show any 
inhibitory effect on PKA activity. It is also interesting to note that PKIa did not inhibit the 

35 three yeast PKA homologs (TPKl, TPK2, TPK3) or the other two yeast protein kinases 
tested, HSLl and RCKl. 
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V. Detailed Description of the Invention 

The invention is directed to protein chips, /.e., positionally addressable arrays of 
proteins on a solid support, useful for the large-scale study of protein function, wherein the 

S protein chip contains densely packed reaction wells. A positionally addressable array 
provides a configiu-ation such that each probe or protein of interest is located at a known, 
predetermined position on the solid support such that the identity of each probe or protein 
can be determined from its position on the array. The invention is also directed to methods 
of using protein chips to assay the presence, amount, and/or functionality of proteins present 

10 in at least one sample. The invention also is directed to methods of using the protein chips 
for high density and small volume chemical reactions. Also, the invention is directed to 
polymers useful as protein chip substrates and methods of making protein chips. The 
invention further relates to compounds useful for the derivatization of protein chip 
substrate. 

15 In one embodiment, the invention is a positionally addressable array comprising a 

plurality of different substances, selected from the group consisting of proteins, molecules 
comprising functional domains of said proteins, whole cells, and protein-containing cellular 
material, on a solid support, with each different substance being at a different position on 
the solid support, wherein the plurality of different substances consists of at least 100 

20 different substances per cm^. In one embodiment, said plurality of different substances 
consists of between 100 and 1000 different substances per cm^. In another embodiment, 
said plurality of different substances consists of between 1000 and 10,000 different 
substances per cm^. In another embodiment, said plurality of different substances consists 
of between 10,000 and 100,000 different substances per cm^. In yet another embodiment, 

25 said pluraUty of different substances consists of between 100,000 and 1,000,000 different 
substances per cm^. In yet another embodiment, said plurality of different substances 
consists of between 1,000,000 and 10,000,000 different substances per cm^. In yet another 
embodiment, said plurality of different substances consists of between 10,000,000 and 
25,000,000 different substances per cm^. In yet another embodiment, said plurality of 

30 different substances consists of at least 25,000,000 different substances per cm^. In yet 
another embodiment, said plxu-ality of different substances consists of at least 
10,000,000,000 different substances per cm^. In yet another embodiment, said plurality of 
different substances consists of at least 10,000,000,000,000 different substances per cm^. 

In another embodiment, the invention is a positionally addressable array comprising 

35 a plurality of different substances, selected from the group consisting of proteins, molecules 
comprising functional domains of said proteins, whole cells, and protein-containing cellular 
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material, on a solid support, with each different substance being at a different position on 
the solid support, wherein the plurality of different substances consists of at least 100 
different substances per cm^, and wherein the solid support is a glass slide. 

In another embodiment, the invention is a positionally addressable array comprising 

5 a plurality of different substances, selected from the group consisting of proteins, molecules 
comprising functional domains of said proteins, whole cells, and protein-containing cellular 
material, on a solid support, with each different substance being at a different position on 
the solid support, wherein the plxu^ality of different substances consists of about 30 to 100 
different substances per cm^. In a specific embodiment, said plurality of different 

10 substances consists of 30 different substances per cm^. In a particular embodiment, said 
plurality of different substances consists of between 30 and 50 different substances per cm^. 
In another particular embodiment, said plurality of different substances consists of between 
50 and 100 different substances per cm^. 

In various specific embodiments, the invention is a positionally addressable array 

15 comprising a plurality of different proteins, or molecules comprising functional domains of 
said proteins, on a solid support, with each different protein or molecule being at a different 
position on the solid support, wherein the plurality of different proteins or molecules 
consists of at least 50%, 75%, 90%, or 95% of all expressed proteins with the same type of 
biological activity in the genome of an organism. For example, such organism can be 

20 eukaryotic or prokaryotic, and is preferably a manmial, a human or non-human animal, 
primate, mouse, rat, cat, dog, horse, cow, chicken, fungus such as yeast, Drosophila^ C. 
elegans, etc. Such type of biological activity of interest can be, but is not limited to, 
enzymatic activity {e,g., kinase activity, protease activity, phosphatase activity, glycosidase, 
acetylase activity, and other chemical group transferring enzymatic activity), nucleic acid 

25 binding, hormone binding, etc. 

A. Production of Proteip Chios 

The protein chips with densities of wells in an array of the present invention are 
30 preferably cast from master molds which have been stamped, milled, or etched using 
conventional microfabrication or microlithographic techniques. Preferably conventional 
microlithographic techniques and materials are utilized in the production of the master 
molds. Once a master mold has been produced, the master mold may then be used directly 
to mold the protein chips per se. Altematively, secondary or tertiary molds can be cast from 
35 the master mold and the protein chips cast from these secondary or tertiary molds. 
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The master mold can be made from any material that is suitable for microfabrication 
or microlithography, with silicon, glass, quartz, polyimides, and polymethylmethacrylate 
(Lucite) being preferred. For microlithography, the preferred material is silicon wafers. 
Once the appropriate master, secondary, or tertiary mold has been produced, the 

5 protein chip is cast. The protein chip can be cast in any solid support that is suitable for 
casting, including either porous or non-porous solid supports. Ceramics, amorphous silicon 
carbide, castable oxides that produce casts of Si02 when cured, polyimides, 
polymethyhnethacrylates, and polystyrenes are preferred solid supports, with silicone 
elastomeric materials being most preferred. Of the silicone elastomeric materials, 

10 polydimethylsiloxane (PDMS) is the most preferred solid support. An advantage of silicone 
elastomeric materials is the ease with which they are removed from the mold due to their 
flexible nature. 

Figure 6 illustrates an example of one method useful for realizing high-density 
arrays of wells on protein chips according to this invention. A silicon wafer with an oxide 
15 layer sandwiched between layers of silicon is provided (Figure 6a). Known as silicon-on- 
insulator or SOI wafers, these wafers are commonly available from wafer supply companies 
(e,g„ Belle Mead Research, Belle Mead, NJ, and Virginia Semiconductor, Fredericksburg, 
VA). 

The silicon wafer is then patterned and etched via an etch process (Figures 6b-d). 
20 The buried oxide layer acts as a very effective etch stop and results in highly uniform etch 
depth across the wafer. Etch depth is independent of the etch process and merely is 
determined by the thickness of the top silicon layer. 

A wet chemical etch process (e,g., using KOH or tetra-methyl hydrazine (TMAH)) 
can be utilized. However, this technique is slightly more dependent on the crystal 
25 orientation of the silicon wafer. Thus, a technique using a rarefied gas (typically SF^ in a 
reactive ion etch (RIE) is preferred. RIE etching techniques are capable of reaHzing highly 
anisotropic wells in silicon that are independent of the crystal orientation of the silicon 
wafer. The references G. Kovacs, Micromachined Transducers Sourcebook. Academic 
Press (1998) and M. Madou, Fundamentals of Microfabrication, CRC Press (1997) provide 
30 background on etching techniques. 

Both types of microlithography can be utilized on a single chip to obtain the desired 
combination of well shapes. Wet-chemical etching is an isotropic process which gives U- 
shaped wells, while RIE is an anisotropic process which gives square bottomed wells. 

After etching the wafer to realize a master mold, it can be used to cast protein chips 
35 (Figures 6e-f). These structures can be the protein chips or themselves be secondary or 
tertiary molds from which additional casting of protein chips occurs. 
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Thus, in one embodiment, a method of making a positionally addressable array, 
comprising a plurality of wells on the surface of a solid support, comprises casting an array 
from a microfabricated mold designed to produce a density of wells on a solid surface of 
greater than 100 wells/cm^. In another embodiment, a method of making a positionally 

5 addressable array, comprising a plurality of wells on the surface of a solid support, 

comprises casting a secondary mold from said microfabricated mold designed to produce a 
density of wells on a solid surface of greater than 100 wells/cm^ and casting at least one 
array from the secondary mold. In yet another embodiment, a method of making a 
positionally addressable array comprises covering the mold with a liquid cast material, and 

10 curing the cast material imtil the cast is solid. The liquid cast material is preferably silicone 
elastomer, most preferably polydimethylsiloxane. Into any of these positionally addressable 
arrays, a pluraUty of different substances, selected from the group consisting of proteins, 
molecules comprising functional domains of said proteins, whole cells, and protein- 
containing cellular material, can be deposited such that each different substance is found in 

15 a different well on the solid support. 

B. Features of Protein Chips 

The protein chips of the present invention are not limited in their physical 
20 dimensions and may have any dimensions that are convenient. For the sake of 
compatibility with current laboratory apparatus, protein chips the size of a standard 
microscope slide or smaller are preferred. Most preferred are protein chips sized such that 
two chips fit on a microscope slide. Also preferred are protein chips sized to fit into the 
sample chamber of a mass spectrometer. 
25 The wells in the protein chips of the present invention may have any shape such as 

rectangular, square, or oval, with circular being preferred. The wells in the protein chips 
may have square or round bottoms, V-shaped bottoms, or U-shaped bottoms. Square 
bottoms are slightly preferred because the preferred reactive ion etch (RIE) process, which 
is anisotropic, provides square-bottomed wells. The shape of the well bottoms need not be 
30 uniform on a particular chip, but may vary as required by the particular assay being carried 
out on the chip. 

The wells in the protein chips of the present invention may have any width-to-depth 
ratio, with ratios of width-to-depth between about 10:1 and about 1:10 being preferred. The 
wells in the protein chips of the present invention may have any volume, with wells having 
35 volmnes of between 1 pi and 5 ^il preferred and wells having volumes of between 1 nl and 1 
\il being more preferred. The most preferred volume for a well is between 100 nl and 300 
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nl. For protein chips with very high densities of wells, the preferred volume of a well is 
between 10 pi and 100 nl. 

The protein chips of the invention can have a wide variety of density of wells/cm^ 
The preferred density of wells is between about 25 wells/cm^ and about 10,000,000,000,000 
5 wells/cm^ Densities of wells on protein chips cast from master molds of laser milled Lucite 
are generally between 1 well/cm^ and 2,500 wells/cm^. Appropriate milling tools produce 
wells as small as 100 ^m in diameter and 100 ^m apart. Protein chips cast from master 
mold etched by wet-chemical microlithographic techniques have densities of wells 
generally between 50 wells/cm^ and 10,000,000,000 wells/cm^. Wet-chemical etching can 
10 produce wells that are 10 |im deep and 10 ^m apart, which in turn produces wells that are 
less than 10 p.m in diameter. Protein chips cast from master mold etched by RIE 
microlithographic techniques have densities of wells generally between 100 wells/cm^ and 
25,000,000 wells/cm^. RIE in combination with optical lithography can produce wells that 
t g are 500 nm in diameter and 500 nm apart. Use of electron beam lithography in combination 

^2 15 with RIE can produce wells 50 nm in diameter and 50 nm apart. Wells of this size and with 
. g equivalent spacing produces protein chips with densities of wells 10,000,000,000,000 

wells/cm^. Preferably, RIE is used to produce wells of 20 \im in diameter and 20 jim apart. 
Wells of this size that are equivalently spaced will result in densities of 25,000,000 
wells/cm^. 

20 The microfabrication and microlithographic techniques described above have been 

used successfiiUy to wet-chemically etch silicon wafers with well sizes of 560 nm or 280 
nm with spacing of about 1 mm. This combination of wells and spacing produces arrays of 
about 410,000 wells/cm^ and about 610,000 wells/cm^, respectively. When well size and 
spacing are equivalent, protein chips with about 3.19 million wells/cm^ and 12.75 million 
25 wells/cm^ are produced. 

In one embodiment, the array comprises a plurality of wells on the surface of a solid 
support wherein the density of wells is at least 100 wells/cm^. In another embodiment, said 
density of wells is between 100 and 1000 wells/cm^. In another embodiment, said density 
of wells is between 1000 and 10,000 wells/cm^. In another embodiment, said density of 
30 wells is between 10,000 and 100,000 wells/cm^. In yet another embodiment, said density of 
wells is between 100,000 and 1,000,000 wells/cm^ In yet another embodiment, said 
density of wells is between 1,000,000 and 10,000,000 wells/cm^ In yet another 
embodiment, said density of wells is between 10,000,000 and 25,000,000 wells/cm\ In yet 
another embodiment, said density of wells is at least 25,000,000 wells/cm^ In yet another 
35 embodiment, said density of wells is at least 10,000,000,000 wells/cm^ In yet another 
embodiment, said density of wells is at least 10,000,000,000,000 wells/cm^ 
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C. utilization of Protein Chips 



In one embodiment, the present invention provides a protein chip comprising a flat 
surface, such as, but not limited to, glass slides. Dense protein arrays can be produced on, 
5 for example, glass slides, such that chemical reactions and assays can be conducted, thus 
allowing large-scale parallel analysis of the presence, amount, and/or functionality of 
proteins (e.g., protein kinases). Proteins or probes are bound covalently or non-covalently 
to the flat siuface of the solid support. The proteins or probes can be bound directly to the 
flat surface of the solid support, or can be attached to the solid support through a linker 
1 0 molecule or compound. The linker can be any molecule or compoimd that derivatizes the 
surface of the solid support to facilitate the attachment of proteins or probes to the surface 
of the solid support. The linker may covalently or non-covalently bind the proteins or 
probes to the surface of the solid support. In addition, the linker can be an inorganic or 
organic molecule. Preferred linkers are compoimds with free amines. Most preferred 
£5 15 among linkers is 3-glycidooxypropyltrimethoxysilane (GPTS). 

l£ In another embodiment, the protein chips of the present invention have several 

* "J advantages over flat surface arrays. Namely, the use of wells eliminates or reduces the 

l2 likelihood of cross-contamination with respect to the contents of the wells. Another 

advantage over flat surfaces is increased signal-to-noise ratios. Wells allow the use of 
20 larger volumes of reaction solution in a denser configuration, and therefore greater signal is 
Q possible. Furthermore, wells decrease the rate of evaporation of the reaction solution from 

^ the chip as compared to flat surface arrays, thus allowing longer reaction times. 

H Another advantage of wells over flat surfaces is that the use of wells permit 

association studies using a fixed, limited amoimt of probe for each well on the chip, 
25 whereas the use of flat surfaces usually involves indiscriminate probe application across the 
whole substrate. When a probe in a mixture of probes has a high affinity, but low 
specificity, the indiscriminate application of the probe mixture across the substrate will 
saturate many of the proteins with the high affinity probe. This saturation effectively limits 
the detection of other probes in the mixture. By using wells, a limited amount of a probe 
30 can be applied to individual wells on the chip. Thus, the amount of the probe applied to 
individual proteins can be controlled, and the probe can be different for different proteins 
(situated in different wells). 

Once a protein chip is produced as described above, it can be used to conduct assays 
and other chemical reactions. For assays, proteins or probes will generally be placed in the 
35 wells. The presence or absence of proteins or probes will be detected by the application of 
probes or proteins, respectively, to the protein chip. The protein-probe interaction can be 
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visualized using a variety of techniques known in the art, some of which are discussed 
below. 

Proteins useful in this invention can be fusion proteins, in which a defined domain is 
attached to one of a variety of natural proteins, or can be intact non-fusion proteins. 

5 In another embodiment, protein-containing cellular material, such as but not limited 

to vesicles, endosomes, subcellular organelles, and membrane fragments, can be placed on 
the protein chip (c.^., in wells). In another embodiment, a whole cell is placed on the 
protein chip (e.g., in wells). In a further embodiment, the protein, protein-containing 
cellular material, or whole cell is attached to the solid support of the protein chip. 

10 The protein can be purified prior to placement on the protein chip or can be purified 

during placement on the chip via the use of reagents that bind to particular proteins, which 
have been previously placed on the protein chip. Partially purified protein-containing 
cellular material or cells can be obtained by standard techniques (e.g., affinity or column 
chromatography) or by isolating centrifugation samples (e.g., PI or P2 firactions). 

15 Furthermore, proteins, protein-containing cellular material, or cells can be embedded 

in artificial or natural membranes prior to or at the time of placement on the protein chip. In 
another embodiment, proteins, protein-containing cellular material, or cells can be 
embedded in extracellular matrix component(s) (e.g., collagen or basal lamina) prior to or at 
the time of placement on the protein chip. The proteins of the invention can be in solution, 

20 or bound to the surface of the solid support (e,g.y in a well, or on a flat surface), or bound to 
a substrate (e,g., bead) placed in a well of the solid support. 

The placement of proteins or probes in the wells can be accomplished by using any 
dispensing means, such as bubble jet or ink jet printer heads. A micropipette dispenser is 
preferred. The placement of proteins or probes can either be conducted manually or the 

25 process can be automated through the use of a computer connected to a machine. 

Since the wells are self-contained, the proteins or probes need not be attached or 
boxmd to the siuface of the solid support, but rather the proteins or probes can simply be 
placed in the wells, or bound to a substrate (e.g., bead) that is placed in the wells. Other 
substrates include, but are not limited to, nitrocellulose particles, glass beads, plastic beads, 

30 magnetic particles, and latex particles. Alternatively, the proteins or probes are bound 

covalently or non-covalently to the surface of the solid support in the wells. The proteins or 
probes can be bound directly to the surface of the solid support (in the well), or can be 
attached to the solid support through a linker molecule or compound. The linker can be any 
molecule or compound that derivatizes the surface of the solid support to facilitate the 

35 attachment of proteins or probes to the surface of the solid support. The linker may 

covalently bind the proteins or probes to the surface of the solid support or the liiiker may 
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bind via non-covalent interactions. In addition, the linker can be an inorganic or organic 
molecule. Preferred linkers are compounds with free amines. Most preferred among linkers 
is 3-glycidooxypropyltrimethoxysilane (GPTS). 

Proteins or probes which are non-covalently bound to the well surface may utilize a 

5 variety of molecular interactions to accomplish attachment to the well surface such as, for 
example, hydrogen bonding, van der Waals bonding, electrostatic, or metal-chelate 
coordinate bonding. Further, DNA-DNA, DNA-RNA and receptor^ligand interactions are 
types of interactions that utilize non-covalent binding. Examples of receptor-ligand 
interactions include interactions between antibodies and antigens, DNA-binding proteins 

10 and DNA, enzyme and substrate, avidin (or strep tavidin) and biotin (or biotinylated 

molecules), and interactions between lipid-binding proteins and phospholipid membranes or 
vesicles. For example, proteins can be expressed with fusion protein domains that have 
affinities for a substrate that is attached to the surface of the well. Suitable substrates for 
fusion protein binding include trypsin/anhydrotrypsin, glutathione, immunoglobulin 

1 5 domains, maltose, nickel, or biotin and its derivatives, which bind to bovine pancreatic 
trypsin inhibitor, glutathione-S-transferase, antigen, maltose binding protein, poly-histidine 
(e.g., HisX6 tag), and avidin/streptavidin, respectively. 

D. Assays on Protein Chips 

20 

In one embodiment, the protein chips are used in assays by using standard 
enzymatic assays that produce chemiluminescence or fluorescence. Detection of various 
proteins and molecular modifications can be accomplished using, for example, 
photoluminescence, fluorescence using non-protein substrates, enzymatic color 

25 development, mass spectroscopic signature markers, and amplification (e.g., by PGR) of 
oligonucleotide tags. Thus, protein/probe interaction can be detected by, inter alia, 
chemiluminescence, fluorescence, radiolabeling, or atomic force microscopy. Probes 
binding to specific elements in the array can also be identified by direct mass spectrometry. 
For example, probes released into solution by non-degradative methods, which dissociate 

30 the probes from the array elements, can be identified by mass spectrometry (see, e.g., WO 
98/59361). In another example, peptides or other compounds released into solution by 
enzymatic digests of the array elements can be identified by mass spectrometry. 

The types of assays fall into several general categories. As a first example, each 
well on the array is exposed to a single probe whose binding is detected and quantified. The 

35 results of these assays are visualized by methods including, but not limited to: 1) using 
radioactively labeled ligand followed by autoradiography and/or phosphoimager analysis; 
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2) binding of hapten, which is then detected by a fluorescently labeled or enzymatically 
labeled antibody or high affinity hapten ligand such as biotin or streptavidin; 3) mass 
spectrometry; 4) atomic force microscopy; 5) fluorescent polarization methods; 6) rolling 
circle amplification-detection methods (Hatch et al., 1999, "Rolling circle amplification of 
5 DNA immobilized on solid surfaces and its application to multiplex mutation detection". 
Genet. Anal. 15(2):35-40); 7) competitive PGR (Fini et aL, 1999, "Development of a 
chemiluminescence competitive PGR for the detection and quantification of parvovirus B19 
DNA using a microplate luminometer", Glin Ghem. 45(9): 1391-6; Kruse et aL, 1999, 
"Detection and quantitative measurement of transforming growth factor-betal (TGF-betal) 
10 gene expression using a semi-nested competitive PGR assay", Gytokine 1 1(2):179-85; 
Guenthner and Hart, 1998, "Quantitative, competitive PGR assay for HIV-1 using a 
microplate-based detection system", Biotechniques 24(5):810-6); 8) colorimetric 
j 2 procedures; and 9) biological assays, e.g,, for virus titers. 

As a second example, each well on the array is exposed to multiple probes 
15 concurrently, including pooling of probes firom several sources, whose binding is detected 
t:0 and quantified. The results of these assays are visualized by methods including, but not 

limited to: 1) mass spectrometry; 2) atomic force microscopy; 3) infi^ared red or 
l'^. fluorescently labeled compoimds or proteins; 4) amplifiable oligonucleotides, peptides or 

L molecular mass labels; and 5) by stimulation or inhibition of the protein's enzymatic 

In 20 activity. Information is gleaned fi-om mixtures of probes because of the positionally 
^3 addressable nature of the arrays of the present invention, i.e., through the placement of 

□ defined proteins at known positions on the protein chip, information about to what the 

bound probe binds is known. If so desired, positions on the array that demonstrate binding 
can then be probed with individual probes to identify the specific interaction of interest. 
25 Usefiil information also can be obtained, for example, by incubating a protein chip 

with cell extracts, wherein each well on the chip contains a reaction mix to assay an 
enzymatic activity of interest, and wherein a plurality of different enzymatic and/or 
substrate activities are assayed, and thereby identifying and measuring the cellular 
repertoire of particular enzymatic activities. Similarly, the protein chip can be incubated 
30 with whole cells or preparations of plasma membranes to assay, for example, for expression 
of membrane-associated proteins or molecules, or binding properties of cell sxu^ace proteins 
or molecules. Gells, markers on a cell, or substances secreted by a cell that bind to 
particular locations on the protein chip can be detected using techniques known in the art. 
For example, protein chips containing arrays of antigens can be screened with B-cells or T- 
35 cells, wherein the antigens are selected fi-om the group consisting of synthetic antigens, 
tissue-specific antigens, disease-specific antigens, antigens of pathogens, and antigens of 
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autologous tissues. The antigen or antigenic determinant recognized by the lymphocytes 
can be determined by establishing at what position on the array activation of the cells by 
antigen occurs. Lymphocyte activation can be assayed by various means including, but not 
limited to, detecting antibody synthesis, detecting or measuring incorporation of ^H- 

5 thymidine, probing of cell surface molecules with labeled antibodies to identify molecules 
induced or suppressed by antigen recognition and activation {e.g., IgD, C3b receptor, IL-2 
receptor, transferrin receptor, membrane class n MHC molecules, CD23, CD38, PCA-1 
molecules, HLA-DR), and identify expressed and/or secreted cytokines. 

In another example, mitogens for a specific cell-type can be determined by 

10 incubating the cells with protein chips containing arrays of putative mitogens, comprising 
the steps of contacting a positionally addressable array with a population of cells; said array 
comprising a plurality of different substances, selected from the group consisting of 
proteins, molecules comprising functional domains of said proteins, whole cells, and 
protein-containing cellular material, on a solid support, with each different substance being 

15 at a different position on the solid support, wherein the density of different substances is at 
least 100 different substances per cm^; and detecting positions on the solid support where 
mitogenic activity is induced in a cell. Cell division can be assayed by, for example, 
detecting or measuring incorporation of ^H-thymidine by a cell. Cells can be of the same 
cell type (i.e., a homogeneous population) or can be of different cell types. 

20 In yet another example, cellular uptake and/or processing of proteins on the protein 

chips can be assayed by, for example, using radioactively labeled protein substrates and 
measuring either a decrease in radioactive substrate concentration or uptake of radioactive 
substrate by the cells. These assays can be used for either diagnostic or therapeutic 
pmposes. One of ordinary skill in the art can appreciate many appropriate assays for 

25 detecting various types of cellular interactions. 

Thus, use of several classes of probes {e.g., known mixtures of probes, cellular 
extracts, subcellular organelles, cell membrane preparations, whole cells, etc.) can provide 
for large-scale or exhaustive analysis of cellular activities. In particular, one or several 
screens can forai the basis of identifying a "footprint" of the cell type or physiological state 

30 of a cell, tissue, organ or system. For example, different cell types (either morphological or 
functional) can be differentiated by the pattern of cellular activities or expression 
determined by the protein chip. This approach also can be used to determine, for example, 
different stages of the cell cycle, disease states, altered physiologic states {e.g., hypoxia), 
physiological state before or after treatment {e.g., drug treatment), metabolic state, stage of 

35 differentiation or development, response to environmental stimuli {e.g., light, heat), cell-cell 
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interactions, cell-specific gene and/or protein expression, and disease-specific gene and/or 
protein expression. 

Enzymatic reactions can be performed and enzymatic activity measured using the 
protein chips of the present invention. In a specific embodiment, compoxmds that modulate 

5 the enzymatic activity of a protein or proteins on a chip can be identified. For example, 
changes in the level of enzymatic activity are detected and quantified by incubation of a 
compound or mixture of compoimds with an enzymatic reaction mixture in wells of the 
protein chip, wherein a signal is produced (e.g., fi-om substrate that becomes fluorescent 
upon enzymatic activity). Differences between the presence and absence of the compound 

10 are noted. Furthermore, the differences in effects of compoimds on enzymatic activities of 
different proteins are readily detected by comparing their relative effect on samples within 
the protein chips and between chips. 

The variety of strategies of using the high density protein chips of the present 
invention, detailed above, can be used to determine various physical and functional 

15 characteristics of proteins. For example, the protein chips can be used to assess the 

presence and amount of protein present by probing with an antibody. In one embodiment, a 
polydimethylsiloxane (PDMS) chip of GST fusion proteins can be probed to determine the 
presence of a protein and/or its level of activity. The protein can be detected using standard 
detection assays such as luminescence, chemilimunescence, fluorescence or 

20 chemifluorescence. For example, a primary antibody to the protein of interest is recognized 
by a fluorescently labeled secondary antibody, which is then measxired with an instrument 
(e.g., a Molecular Dynamics scanner) that excites the fluorescent product with a light source 
and detects the subsequent fluorescence. For greater sensitivity, a primary antibody to the 
protein of interest is recognized by a secondary antibody that is conjugated to an enzyme 

25 such as alkaline phosphatase or horseradish peroxidase. In the presence of a luminescent 
substrate (for chemiluminescence) or a fluorogenic substrate (for chemifluorescence), 
enzymatic cleavage yields a highly luminescent or fluorescent product which can be 
detected and quantified by using, for example, a Molecular Dynamics scanner. 
Alternatively, the signal of a fluorescently labeled secondary antibody can be amplified 

30 using an alkaline phosphatase-conjugated or horseradish peroxidase-conjugated tertiary 
antibody. 

Identifying substrates of protein kinases, phosphatases, proteases, glycosidases, 
acetylases, or other group transferring enzymes can also be conducted on the protein chips 
of the present invention. For example, a wide variety of different probes are attached to the 
35 protein chip and assayed for their ability to act as a substrate for particular enzyme(s), e,g, 
assayed for their ability to be phosphorylated by protein kinases. Detection methods for 
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kinase activity, include, but are not limited to, the use of radioactive labels, such as ^^P-ATP 
and ^*S-y-ATP, or fluorescent antibody probes that bind to phosphoamino acids. For 
example, whereas incorporation into a protein of radioactively labeled phosphorus indicates 
kinase activity in one assay, another assay can measure the release of radioactively labeled 

5 phosphorus into the media, which indicates phosphatase activity. In another example, 
protease activity can be detected by identifying, using standard assays {e.g., mass 
spectrometry, fluorescently labeled antibodies to peptide fragments, or loss of fluorescence 
signal from a fluorescently tagged substrate), peptide fragments that are produced by 
protease activity and released into the media. Thus, activity of group-transferring enzymes 

10 can be assayed readily using several approaches and many independent means of detection, 
which would be appreciated by one of ordinary skill in the art. 

Protein chips can be used to identify proteins on the chip that have specific activities 
such as specific kinases, proteases, nucleic acid binding properties, nucleotide hydrolysis, 
hormone binding and DNA binding. Thus, the chip can be probed with a probe that will 

15 indicate the presence of the desired activity. For example, if DNA binding is the activity of 
interest, the chip containing candidate DNA-binding proteins is probed with DNA. 

The search for probes (natural or synthetic) that are protein or nucleic acid ligands 
for an array of proteins can be carried out in parallel on a protein chip. A probe can be a 
cell, protein-containing cellular material, protein, oligonucleotide, polynucleotide, DNA, 

20 RNA, small molecule substrate, drug candidate, receptor, antigen, steroid, phospholipid, 
antibody, immunoglobulin domain, glutathione, maltose, nickel, dihydrotrypsin, or biotin. 
Alternatively, the probe can be an enzyme substrate or inhibitor. For example, the probe 
can be a substrate or inhibitor of an enzyme chosen from the group consisting of kinases, 
phosphatases, proteases, glycosidases, acetylases, and other group transferring enzymes. 

25 After incubation of proteins on a chip with combinations of nucleic acid or protein probes, 
the bound nucleic acid or protein probes can be identified by mass spectrometry (Lakey et 
al., 1998, "Measuring protein-protein interactions", Curr Opin Struct Biol. 8:1 19-23). 

The identity of target proteins from pathogens (e.g., an infectious disease agent such 
as a virus, bacterium, fungus, or parasite) or target proteins from abnormal cells (e.g,^ 

30 neoplastic cells, diseased cells, or damaged cells) that serve as antigens in the immune 

response of recovering or non-recovering patients can be determined by using a protein chip 
of the invention. For example, lymphocytes isolated from a patient can be used to screen 
protein chips comprising arrays of a pathogen's proteins on a protein chip. In general, these 
screens comprise contacting a positionally addressable array with a plurality of 

35 lymphocytes, said array comprising a plurality of potential antigens on a solid support, with 
each different antigen being at a different position on the solid support, wherein the density 



-21 - 



NY2- 1102584.9 



of different antigens is at least 100 different antigens per cm^, and detecting positions on the 
solid support where lymphocyte activation occurs. In a specific embodiment, lymphocytes 
are contacted with a pathogen's proteins on an array, after which activation of B-cells or T- 
cells by an antigen or a mixture of antigens is assayed, thereby identifying target antigens 

5 derived fi-om a pathogen. 

Alternatively, the protein chips are used to characterize an immune response by, for 
example, screening arrays of potential antigens to identify the targets of a patient's B-cells 
and/or T-cells. For example, B-cells can be incubated with an array of potential antigens 
(i.e., molecules having antigenic determinants) to identify antigenic targets for humoral- 

10 based immunity. The source of antigens can be, for example, from autologous tissues, 
collections of known or unknown antigens (eg., of pathogenic microorganisms), tissue- 
specific or disease-specific antigen collections, or synthetic antigens. 

In another embodiment, lymphocytes isolated from a patient can be used to screen 
protein chips comprising arrays of proteins derived from a patient's own tissues. Such 

1 5 screens can identify substrates of autoimmunity or allergy-causing proteins, and thereby 
diagnose autoinununity or allergic reactions, and/or identify potential target drug 
candidates. 

In another embodiment, the protein chips of the invention are used to identify 
substances that are able to activate B-cells or T-cells. For example, lymphocytes are 

20 contacted with arrays of test molecules or proteins on a chip, and lymphocyte activation is 
assayed, thereby identifying substances that have a general ability to activate B-cells or T- 
cells or subpopulations of lymphocytes (e.g,, cytotoxic T-cells). 

Induction of B-cell activation by antigen recognition can be assayed by various 
means including, but not limited to, detecting or measuring antibody synthesis, 

25 incorporation of ^H-thymidine, binding of labeled antibodies to newly expressed or 

suppressed cell surface molecules, and secretion of factors indicative of B-cell activation 
(e.g., cytokines). Similarly, T-cell activation in a screen using a protein chip of the 
invention can be determined by various assays. For example, a chromium (^*Cr) release 
assay can detect recognition of antigen and subsequent activation of cytotoxic T-cells (see, 

30 e,g., Palladino et al., 1987, Cancer Res. 47:5074-9; Blachere et aL, 1993, J. Immunotherapy 
14:352-6). 

The specificity of an antibody preparation can be determined through the use of a 
protein chip of the invention, comprising contacting a positionally addressable array with an 
antibody preparation, said array comprising a plurality of potential antigens on a solid 
35 support, with each different antigen being at a different position on the solid support, 
wherein the density of different antigens is at least 100 different antigens per cm^, and 
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detecting positions on the solid support where binding by an antibody in the antibody 
preparation occurs. The antibody preparation can be, but is not limited to. Fab fragments, 
antiserum, and polyclonal, monoclonal, chimeric, single chain, humanized, or synthetic 
antibodies. For example, an antiserum can be characterized by screening disease-specific, 

5 tissue-specific, or other identified collections of antigens, and determining which antigens 
are recognized. In a specific embodiment, protein chip arrays having similar or related 
antigens are screened with monoclonal antibodies to evaluate the degree of specificity by 
determining to which antigens on the array a monoclonal antibody binds. 

The identity of targets of specific cellular activities can be assayed by treating a 

10 protein chip with complex protein mixtures, such as cell extracts, and determining protein 
activity. For example, a protein chip containing an array of different kinases can be 
contacted with a cell extract from cells treated with a compound (e^., a drug), and assayed 
for kinase activity. In another example, a protein chip containing an array of different 
kinases can be contacted with a cell extract from cells at a particular stage of cell 

15 differentiation (e.g., pluripotent) or fi^m cells in a particular metabolic state (e.g., mitotic), 
and assayed for kinase activity. The results obtained from such assays, comparing for 
example, cells in the presence or absence of a drug, or cells at several differentiation stages, 
or cells in different metabolic states, can provide information regarding the physiologic 
changes in the cells between the different conditions.' 

20 Alternatively, the identity of targets of specific cellular activities can be assayed by 

treating a protein chip of the invention, containing many different proteins (e.g., a peptide 
library), with a complex protein mixture (e.g., such as a cell extract), and assaying for 
modifications to the proteins on the chip. For example, a protein chip containing an array 
of different proteins can be contacted with a cell extract from cells treated with a compound 

25 (e.g., a drug), and assayed for kinase, protease, glycosidase, actetylase, phosphatase, or 
other transferase activity, for example. In another example, a protein chip containing an 
array of different proteins can be contacted with a cell extract from cells at a particular stage 
of cell differentiation (e.g., pluripotent) or from cells in a particular metabolic state (e.g., 
mitotic). The results obtained from such assays, comparing for example, cells in the 

30 presence or absence of a drug, or cells at several stages of differentiation, or cells in 

different metabolic states, can provide information regarding the physiologic effect on the 
cells under these conditions. 

The protein chips are useful to identify probes that bind to specific molecules of 
biologic interest including, but not limited to, receptors for potential ligand molecules, virus 

35 receptors, and ligands for orphan receptors. 
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The protein chips are also useful to detecting DNA binding or RNA binding to 
proteins on the protein chips, and to determine the binding specificity. In addition, 
particular classes of RNA-binding or DNA-binding proteins (e^., zinc-finger proteins) can 
be studied with the protein chips by screening arrays of these proteins with nucleic acid 

5 sequences, and determining binding specificity and binding strength. 

The identity of proteins exhibiting differences in fimction, ligand binding, or 
enzymatic activity of similar biological entities can be analyzed with the protein chips of 
the present invention. For example, differences in protein isoforms derived bom different 
alleles are assayed for their activities relative to one another. 

10 The high density protein chips can be used for drug discovery, analysis of the mode 

of action of a drug, drug specificity, and prediction of drug toxicity. For example, the 
identity of proteins that bind to a drug, and their relative affinities, can be assayed by 
incubating the proteins on the chip with a drug or drug candidate under different assay 
conditions, determining drug specificity by determining where on the array the drug bound, 

1 5 and measuring the amount of drug bound by each different protein. Bioassays in which a 
biological activity is assayed, rather than binding assays, can alternatively be carried out on 
the same chip, or on an identical second chip. Thus, these types of assays using the protein 
chips of the invention are useful for studying drug specificity, predicting potential side 
effects of drugs, and classifying drugs. Further, protein chips of the invention are suitable 

20 for screening complex libraries of drug candidates. Specifically, the proteins on the chip 
can be incubated with the library of drug candidates, and then the bound components can be 
identified, e.g., by mass spectrometry, which allows for the simultaneous identification of 
all library components that bind preferentially to specific subsets of proteins, or bind to 
several, or all, of the proteins on the chip. Further, the relative affinity of the drug 

25 candidates for the different proteins in the array can be determined. 

Moreover, the protein chips of the present invention can be probed in the presence 
of potential inhibitors, catalysts, modulators, or enhancers of a previously observed 
interaction, enzymatic activity, or biological response. In this manner, for example, 
blocking of the binding of a drug, or disruption of virus or physiological effectors to 

30 specific categories of proteins, can be analyzed by using a protein chip of the present 
invention. 

The protein chips of the invention can be used to determine the effects of a drug on 
the modification of multiple targets by complex protein mixtures, such as for example, 
whole cells, cell extracts, or tissue homogenates. The net effect of a drug can be analyzed 
35 by screening one or more protein chips with drug-treated cells, tissues, or extracts, which 
then can provide a "signature" for the drug-treated state, and when compared with the 
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"signature" of the untreated state, can be of predictive value with respect to, for example, 
potency, toxicity, and side effects. Furthermore, time-dependent effects of a drug can be 
assayed by, for example, adding the drug to the cell, cell extract, tissue homogenate, or 
whole organism, and applying the drug-treated cells or extracts to a protein chip at various 

5 timepoints of the treatment. 

Screening of phage display libraries can be performed by incubating a library with 
the protein chips of the present invention. Binding of positive clones can be determined by 
various methods known in the art (eg., mass spectrometry), thereby identifying clones of 
interest, after which the DNA encoding the clones of interest can be identified by standard 

10 methods {see, e.g., Ames et al., 1995, J. Immunol. Methods 184:177-86; Kettleborough et 
al., 1994, Eur. J. Immunol. 24:952-8; Persic et al., 1997, Gene 187:9-18). In this manner, 
the chips are useful to select for cells having surface components that bind to specific 
proteins on the chip. Alternatively, a phage display library can be attached to the chip, such 
that a positionally addressable array of the library is created, after which the array can be 

1 5 screened repeatedly with different mixtures of probes. 

The invention also provides kits for carrying out the assay regimens of the 
invention. In a specific embodiment, kits of the invention comprise one or more arrays of 
the invention. Such kits may further comprise, in one or more containers, reagents useful 
for assaying biological activity of a protein or molecule, reagents useful for assaying 

20 interaction of a probe and a protein or molecule, reagents useful for assaying the biological 
activity of a protein or molecule having a biological activity of interest, and/or one or more 
probes, proteins or other molecules. The reagents useful for assaying biological activity of 
a protein or molecule, or assaying interactions between a probe and a protein or molecule, 
can be contained in each well or selected wells on the protein chip. Such reagents can be in 

25 solution or in solid form. The reagents may include either or both the proteins or molecules 
and the probes required to perform the assay of interest. 

In one embodiment, a kit comprises one or more protein chips (i.e, positionally 
addressable arrays comprising a plurality of different substances, selected from the group 
consisting of proteins, molecules comprising functional domains of said proteins, whole 

30 cells, and protein-containing cellular material, on a solid support, with each different 
substance being at a different position on the solid support), wherein the plurality of 
different substances consists of at least 100 different substances per cm^, and in one or more 
containers, one or more probes, reagents, or other molecules. The substances of the array 
can be attached to the surface of wells on the solid support. In another embodiment, the 

35 protein chip in the kit can have the protein or probe already attached to the wells of the solid 
support. In yet another embodiment, the protein chip in the kit can have the reagent(s) or 
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reaction mixture useful for assaying biological activity of a protein or molecule, or assaying 
interaction of a probe and a protein or molecule, akeady attached to the wells of the solid 
support. In yet another embodiment, the reagent(s) is not attached to the wells of the solid 
support, but is contained in the wells. In yet another embodiment, the reagent(s) is not 

S attached to the wells of the solid support, but is contained in one or more containers, and 
can be added to the wells of the solid support. In yet another embodiment, the kit further 
comprises one or more containers holding a solution reaction mixture for assaying 
biological activity of a protein or molecule. In yet another embodiment, the kit provides a 
substrate {e.g,, beads) to which probes, proteins or molecules of interest, and/or other 

10 reagents useful for carrying out one or more assays, can be attached, after which the 

substrate with attached probes, proteins, or other reagents can be placed into the wells of the 
chip. 

In another embodiment, one or more protein chips in the kit have, attached to the 
wells of the solid support, proteins with a biological activity of interest. In another 

15 embodiment, one or more protein chips in the kit have, attached to the wells of the solid 
support, at least 50%, 75%, 90% or 95% of all expressed proteins with the same type of 
biological activity in the genome of an organism. In a specific embodiment, one or more 
protein chips in the kit have, attached to the wells of the solid support, at least 50%, 75%, 
90% or 95% of all expressed kinases, phosphatases, glycosidase, proteases, acetylases, other 

20 group transferring enzymes, nucleic acid binding proteins, hormone-binding proteins or 
DNA-binding proteins, within the genome of an organism ie.g,, of a particular species). 

E. Proteins Useful with the Protein Chips 

25 Full-length proteins, portions of full-length proteins, and peptides whether prepared 

from recombinant overexpression in an organism, produced via fragmentation of larger 
proteins, or chemically synthesized, are utilized in this invention to form the protein chip. 
Organisms whose proteins are overexpressed include, but are not limited to, bacteria, yeast, 
insects, humans, and non-human mammals such as mice, rats, cats, dogs, pigs, cows and 

30 horses. Further, fusion proteins in which a defined domain is attached to one of a variety of 
natural or synthetic proteins can be utilized. Proteins used in this invention can be purified 
prior to being attached to, or deposited into, the wells of the protein chip, or purified during 
attachment via the use of reagents which have been previously attached to, or deposited 
into, the wells of the protein chip. These reagents include those that specifically bind 

35 proteins in general, or bind to a particular group of proteins. Proteins can be embedded in 
artificial or natural membranes (e^., liposomes, membrane vesicles) prior to, or at the time 
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of attachment to the protein chip. Alternatively, the proteins can be delivered into the wells 
of the protein chip. 

Proteins used in the protein chips of the present invention are preferably expressed 
by methods known in the art. The InsectSelect system from Invitrogen (Carlsbad, CA, 

5 catalog no. K800-01), a non-lytic, single- vector insect expression system that simplifies 
expression of high-quality proteins and eliminates the need to generate and amplify virus 
stocks, is a preferred expression system. The preferred vector in this system is pEBA^S-His 
TOPO TA vector (catalog no. K890-20). Polymerase chain reaction (PGR) products can be 
cloned directly into this vector, using the protocols described by the manufacturer, and the 

10 proteins are then expressed with N-terminal histidine (His) labels which can be used to 
purify the expressed protein. 

The BAC-TO-BAC™ system, another eukaryotic expression system in insect cells, 
available from Lifetech (Rockville, MD), is also a preferred expression system. Rather than 
using homologous recombination, the BAC-TO-BAC™ system generates recombinant 

15 baculovirus by relying on site-specific transposition in E. coli. Gene expression is driven 
by the highly active polyhedrin promoter, and therefore can represent up to 25% of the 
cellular protein in infected insect cells. 

VI. Example I: Analysis of Yeast Protein Kinases Using Protein Chips 

20 

A- Introduction 

The following example exemplifies the various aspects of protein chip production 
and a method of using the protein chips of the present invention. The protein chip 

25 technology of the present invention is suitable for rapidly analyzing large numbers of 
samples, and therefore this approach was applied to the analysis of nearly all yeast protein 
kinases. Protein kinases catalyze protein phosphorylation and play a pivotal role in 
regulating basic cellular fimctions, such as cell cycle control, signal transduction, DNA 
replication, gene transcription, protein translation, and energy metabolism^. The availability 

30 of a complete genome sequence makes it possible to analyze all of the protein kinases 
encoded by an organism and determine their in vitro substrates. 

The yeast genome has been sequenced and contains approximately 6200 open 
reading frames greater than 100 codons in length; 122 of these are predicted to encode 
protein kinases. Twenty-four of these protein kinase genes have not been studied 

35 previously*. Except for two histidine protein kinases, all of the yeast protein kinases are 
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members of the Ser/Thr family; tyrosine kinase family members do not exist although seven 
protein kinases that phosphorylate serine/threonine and tyrosine have been reported*. 

With the development of the protein chip technology of the present invention, the 
high throughput analysis of the biochemical activities of nearly all of the protein kinases 
5 from Saccharomyces cerevisiae has been conducted as described herein. Protein chips 
utiUzed were disposable arrays of 300 nl wells in silicone elastomer sheets placed on top of 
microscope slides. The high density and small size of the wells allows for high throughput 
batch processing and simultaneous analysis of many individual samples, requiring only 
small amoimts of protein. Using protein chips of the present invention, Saccharomyces 
10 cerevisiae kinase proteins (1 19 different kinases in total) were fused to glutathione-S- 
transferase (GST), overexpressed in yeast, then purified and assayed for their ability to 
phosphorylate 17 different substrates. Nearly all of the kinases tested (93%) exhibited 
activities that were at least five-fold higher than controls, on one or more substrates, 
including 1 8 of 24 previously uncharacterized kinases. Thirty-two kinases exhibited 
preferential phosphorylation of one or two substrates. Twenty-seven kinases readily 
phosphorylated poly(Tyr-Glu). Since only five of these kinases were previously classified 
as dual function kinases (/.e., they phosphorylate both Ser/Thr and Tyr), these findings 
greatly expand our knowledge as to which kinases are able to phosphorylate tyrosine 
residues. Interestingly, these dual specificity kinases often share conunon amino acid 
resides that lie near the catalytic region. These results indicate that the protein chip 
technology of the present invention is useful for high throughput screening of protein 
biochemical activity, and for the analysis of entire proteomes. 

B« Methods 

1. Cell Culture, Constructs and Protein Purification 



Using the recombination strategy of Hudson et al\ 119 of 122 yeast protein kinase 
genes were cloned into a high copy URA3 expression vector (pEG(KG)), which produces 

30 GST fusion proteins imder the control of the galactose-inducible GAL 10 promoter*^. 
Briefly, primers complementary to the end of each ORF were purchased from Research 
Genetics; the ends of these primers contain a common 20 bp sequence. In a second roimd 
of PGR, the ends of these products were modified by adding sequences that are homologous 
to the vector. The PGR products containing the vector sequences at their ends were 

35 transformed along with the vector into a pep4 yeast strain (which lacks several yeast 

proteases)***, and Ura^ colonies were selected. Plasmids were rescued in E, coli, verified by 



-28- 



NY2 - 1 102584.9 



restriction enzyme digestion and the DNA sequence spanning the vector-insert junction was 
determined using a primer complementary to the vector For the GST::Cla4 construct, a 
frame-shift mutation was foimd in a poly(A) stretch in the amino terminal coding region. 
Three independent clones were required to find the correct one that maintained reading 

5 frame. For five of these genes, two overlapping PGR products were obtained and 
introduced into yeast cells. Confirmed plasmids were reintroduced into the pep4 yeast 
strain for kinase protein purification. 

For preparing samples using the 96 well format, 0.75 ml of cells were grown in 
medium containing raffinose to O.D.(600) about 0.5 in boxes containing 2 ml wells; two 

10 wells were used for each strain. Galactose was added to a final concentration of 4% to 
induce protein expression, and the cells were incubated for 4 hrs. The cultures of the same 
strain were combined, washed once with 500 ^1 of lysis buffer, resuspended in 200 \x\ of 
lysis buffer, and transferred into a 96 X 0.5 ml plate (Dot Scientific, USA) containing 100 
|il chilled glass beads. Cells were lysed in the box by repeated vortexing at 4 ^'C and the 

1 5 GST fusion proteins were purified from these strains using glutathione beads and standard 
protocols^** in a 96 well format. The purity of five purified GST::kinase proteins (Swel, 
Ptk2, Pkhl, Hogl, Pbs2) was determined by comparing the Coomasie staining patterns of 
the purified proteins with the pattems obtained by immunoblot analysis using anti-GST 
antibodies. The results indicated that the purified proteins were more than 90% pure. To 

20 purify the activated form of Hogl, the cells were challenged with 0.4 M NaCl in the last 
five minutes of the induction. Protein kinase activity was stable for at least two months at 
-70°C with little or no loss of kinase activity. 

2. Chip Fabrication and Protein Attachment 

25 

Chips were made from the silicone elastomer, polydimethylsiloxane (PDMS) (Dow 
Chemical, USA), which was cast over microfabricated molds. Liquid PDMS was poured 
over the molds and, after curing (at least 4 hours at 65 ^C), flexible silicone elastomer array 
sheets were then peeled from the reusable molds. Although PDMS can be readily cast over 

30 microlithographically fabricated structures, for the purposes of the kinase assay described 
herein, molds made firom sheets of acrylic patterned with a computer-controlled laser 
milling tool (Universal Laser Systems, USA) sufficed. 

Over 30 different arrays were tested. The variables tested were width and depth of 
the wells (widths ranging from 100 |im to 2.5 mm, depths from 100 \xm to 1 nmi), spacing 

35 between wells (100 jim to 1 mm), configuration (either rectangular arrays or closest 
packed), and well shape (square versus round). The use of laser milled acrylic molds 
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offered a fast and inexpensive method to realize a large number prototype molds of varying 
parameters. 

To determine the conditions that maximize protein attachment to the wells, PDMS 
was treated with either 5 M H2SO4, 10 M NaOH, hydrogen peroxide or a 3- 
5 glycidooxypropyltrimethoxysilane linker (GPTS)(Aldrich, USA) GPTS treatment 
resulted in the greatest adsorption of protein to the wells relative to untreated PDMS or 
PDMS treated other ways. Briefly, after washing with 100% EtOH three times at room 
temperature, the chips were immersed in 1% GPTS solution (95% EtOH, 16 mM HOAc) 
with shaking for 1 hr at room temperature. After three washes with 95% EtOH, the chips 
10 were cured at 135**C for 2 hrs imder vacuum. Cured chips can be stored in dry Argon for 
months'^. To attach proteins to the chips, protein solutions were added to the wells and 
incubated on ice for 1 to 2 hours. After rinsing with cold HEPES buffer (10 mM HEPES, 
100 mM NaCl, pH 7.0) three times, the wells were blocked with 1% BSA in PBS (Sigma, 
USA) on ice for > 1 hr. Because of the use of GPTS, any reagent containing primary amine 
groups was avoided. 

To determine the concentration of proteins that can be linked to the treated PDMS, 
horseradish peroxidase (HRP) anti-mouse Ig (Amersham, USA) was attached to the chip 
using serial dilutions of the enzyme. After extensive washing with PBS, the bound 
antibodies were detected using an enhanced chemiluminescent (ECL) detection method 
(Amersham, USA). Up to 8x10'^ |ig/^m^ of protein can be attached to the surface; a 
minimum 8x10'^^ |ig/|im^ is required for detection by our immunostaining methods. 

3. Immunoblotting, Kinase Assay and Data Acquisition 



25 Immimoblot analysis was performed as described^^. GST::protein kinases were 

tested for in vitro kinase activity*^ using ^^P^-ATP. In the autophosphorylation assay, the 
GST::kinases were directly adhered to GPTS-treated PDMS and the in vitro reactions 
carried out with "'^P-y-ATP in appropriate buffer. In the substrate reactions, the substrate 
was adhered to the wells via GPTS, and the wells were washed with HEPES buffer and 
blocked with 1% BSA, before kinase, "P-y-ATP and buffer were added. The total reaction 
volume was kept below 0.5 [xl per reaction. After incubation for 30 minutes at 30''C, the 
chips were washed extensively, and exposed to both X-ray film and a Molecular Dynamics 
Phosphorlmager, which has a resolution of 50 fim and is quantitative. For twelve substrates 
each kinase assay was repeated at least twice; for the remaining five, the assays were 
performed once. 
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To determine substrate specificity, specificity index (SI) was calculated using the 

following formula: SI^^ = F,^ / [(F,/ + F,7 + + F|V ) ^ where / represents the ED of 

kinase used, r represents the ED of a substrate, and F^^ represents the fold increase of a 
kinase / on substrate r compared with GST alone. 

5 

4. Kinase Sequence Alignments and Phylogenetic Trees 

Multiple sequence alignments based on the core kinase catalytic domain 
subsequences of the 107 protein kinases were generated with the CLUSTAL W algorithm^^, 

10 using the Gonnet 250 scoring matrix^^. Kinase catalytic domain sequences were obtained 
from the SWISS-PROT^^ PIR^^ and GenBank^' databases. For those kinases whose 
catalytic domains are not yet annotated pBF4AT>R052C and SLNl AaL147C), probable 
kinase subsequences were inferred from aligrmients with other kinase subsequences in the 
data set with the FASTA algorithm^**'^* using the BLOSUM 50 scoring matrix^^ Protein 

15 subsequences corresponding to the eleven core catalytic subdomains"*^ were extracted from 
the alignments, and the phylogenetic trees were computed with the PROTPARS^ program 
(Figure 5a). 

5. Functional Grouping of Protein Chip Data 

20 

To visualize the approximate fimctional relationships between protein kinases 
relative to the experimental data, kinases were hierarchically ordered based on their ability 
to phosphorylate the 12 different substrates (data available on web site 
http://bioinfo.mbb.yale.edu/genome/yeast/chip as of August 17, 2000). A profile 

25 corresponding to the -/+ activity of the 107 protein kinases to each of the substrates was 
recorded, with discretized values in [0,1]. Matrices were derived from the pairwise 
Hanuning distances between experimental profiles, and imrooted phylogenies were 
computed using the Fitch-Margoliash least-squares estimation method^^ as implemented in 
the FITCH program34 of the PHYLIP software package"*^. In each case, the input order of 

30 taxa was randomized to negate any inherent bias in the organization of the data set, and 
optimal hierarchies were obtained through global rearrangements of the tree structures. 

C. Results 

35 1« Yeast Kinase Cloning and Protein Purification 
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Using a recombination-directed cloning strategy', we attempted to clone the entire 
coding regions of 122 yeast protein kinase genes in a high copy expression vector 
(pEG(KG)) that produces GST fusion proteins under the control of the galactose-inducible 
GAL JO promoter'** (Figure la). GST::kinase constructs were rescued into E. colU and 

5 sequences at the 5 -end of each construct were determined. Using this strategy, 1 19 of the 
122 yeast protein kinase genes were cloned in-frame. The three kinase genes that were not 
cloned are very large (4.5-8.3 kb). 

The GST:kinase fusion proteins were overproduced in yeast and purified from 50 ml 
cultures using glutathione beads and standard protocols' For the case of Hogl the yeast 

10 cells were treated with high salt to activate the enzyme in the last five minutes of induction; 
for the rest of the kinases, synthetic media (URAVraffinose) was used. Immunoblot 
analysis of all 1 19 fusions using anti-GST antibodies revealed that 105 of the yeast strains 
produced detectable GST:: fusion proteins; in most cases the fusions were full length. Up to 
1 Jig of fusion protein per ml of starting culture was obtained (Figure lb). However, 14 of 

15 119 GST::kinase samples were not detected by immunoblotting analysis. Presumably, 
these proteins are not stably overproduced in the pep4 protease-deficient strain used, or 
these proteins may form insoluble aggregates that do not pimfy using our procedures. 
Although this procedure was successful, purification of GST fusion proteins using 50 ml 
cultures is a time-consuming process and not applicable for preparing thousands of samples. 

20 Therefore, a procedure for growing cells in a 96 well format was developed (see Methods). 
Using this procedure, 119 GST fusions were prepared and purified in six hours with about 
two-fold higher yields per ml of starting culture relative to the 50 ml method. 

2. Protein Chip Design 

25 

Protein chips were developed to conduct high throughput biochemical assays of 1 19 
yeast protein kinases (Figure 2). These chips consist of an array of wells in a disposable 
silicone elastomer polydimethylsiloxane (PDMS)**. Arrays of wells allow small volumes of 
different probes to be densely packed on a single chip yet remain physically segregated 

30 during subsequent batch processing. Proteins were covalently attached to the wells using a 
linker 3-glycidooxypropyltrimethoxysilane (GPTS)'^. Up to 8 X 10*' i^g/^im^ of protein can 
be attached to the surface (see Methods). 

For the purposes of the protein kinase assays, the protein chip technology was 
configured to be compatible with standard sample handling and recording equipment. 

35 Using radioisotope labeling (^^P), the kinase assays described below, and manual loading, a 
variety of array configurations were tested. The following chips produced the best results: 
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round wells with 1.4 mm diameter and 300 ^m deep (approximately 300 nl), in a 10x14 
rectangular array configuration with a 1,8 mm pitch. A master mold of twelve of these 
arrays were produced, and a large number of arrays were repeatedly cast for the protein 
kinase analysis. Chips were placed atop microscope slides for handling purposes (Figure 
2a); the arrays covered slightly more than one third of a standard microscope slide and two 
arrays per slide were typically used (Figure 2b). Although a manual pipette method to place 
proteins in each well was employed, automated techniques may also be used. In addition, 
this protein chip configuration may also be used with other labeling methods, such as by 
using fiuorescently labeled antibodies to phosphoproteins, and subsequent detection of 
immunofluorescence. 

3. Large-scale Kinase Assays Using Protein Chips 

All 1 19 GST::protein kinases were tested for in vitro kinase activity'^ in 17 different 
assays using ^^P-y-ATP and 17 different chips. Each chip was assayed using a different 
substrate, as follows; 1) Autophosphorylation, 2) Bovine Histone HI (a common kinase 
substrate), 3) Bovine Casein (a common substrate), 4) Myelin basic protein (a common 
substrate), 5) Axl2 C terminus-GST (Axl2 is a transmembrane phosphoprotein involved in 
budding)*"*, 6) Rad9 (a phosphoprotein involved in the DNA damage checkpoint)'^, 7) Gic2 
(a phosphoprotein involved in budding)**, 8) Redl (a meiotic phosphoprotein important for 
chromosome synapsis) 9) Mekl (a meiotic protein kinase important for chromosome 
synapsis)^*, 10) Poly(tyrosine-glutamate 1:4) (poly(Tyr-Glu)); a tyrosine kinase substrate)*', 
11) Ptk2 ( a small molecule transport protein)^^ 12) Hsll (a protein kinase involved in cell 
cycle regulation)^*, 13) Swi6 (a phosphotranscription factor involved in Gl/S control)^, 14) 
Tub4 (a protein involved in microtubule nucleation)^^, 15) Hogl (a protein kinase involved 
in osmoregulation)^^, 16) Hogl (an inactive form of the kinase), and 17) GST (a control). 
For the autophosphorylation assay, the kinases were directly adhered to the treated PDMS 
wells and ^^P-y-ATP was added; for substrate reactions, the substrates were bound to the 
wells, and then kinases and "P-y-ATP were added. After the reactions were completed, the 
slides were washed and the phosphorylation signals were acquired and quantified using a 
high resolution phosphoimager. Examples are shown in Figure 3. To identify kinase 
activities, the quantified signals were converted into fold increases relative to GST controls 
and plotted for further analysis (Figure 4a). 

As shown in Figure 4a, most (93.3%) kinases exhibited activity five-fold or greater 
over background for at least one substrate. As expected, Hrr25, Pbs2 and Mekl 
phosphorylated their known substrates^**^^, Swi6 (400-fold higher than the GST control). 
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Hogl (10-fold higher) and Redl (10-fold higher), respectively. The results of this assay 
demonstrated that 18 of the 24 predicted protein kinases have not been studied previously 
phosphorylate one or more substrates, as do several unconventional kinases*, including the 
histidine kinases (Sbil, Yil042c) and phospholipid kinases (e.g Mecl). 
5 To determine substrate specificity, the activity of a particular kinase was fiuther 

normalized against the average of its activity against all substrates. Several examples are 
shown in Figure 4b; all the data are available at 

http://bioinfo.mbb.yale.edu/genome/yeast/chip. Thirty-two kinases exhibited substrate 
specificity on a particular substrate with specificity index (SI; see Methods) equal or higher 

10 than 2, and reciprocally, most substrates are preferentially phosphorylated by a particular 
protein kinase or set of kinases. For example, the C terminus of Axl2, a protein involved in 
yeast cell budding, is preferentially phosphorylated by DbfZO, Kin2, Yakl and Ste20 
relative to other protein. Interestingly, previous studies found that Ste20 was localized at 
the tip of emerging buds similar to Axl2, and a ste20jii/cla4^ mutant is unable to bud or 

1 5 form fiilly polarized actin patches or cables^*. Another example is the phosphoprotein Gic2, 
which is also involved in budding^^. Ste20 and Skml strongly phosphorylate Gic2 (Figure 
4b). Previous studies suggested that Cdc42 interacts with Gic2, Cla4^', Ste20 and Skml. 
Our results raise the possibility that Cdc42 may function to promote the phosphorylation of 
Gic2 by recruiting Ste20 and/or Skml. 

20 

4. Yeast Contain Many Dual Specific Kinases 

Of particular interest are the dual specificity kinases, i.e., those enzymes that 
phosphorylate both Ser/Thr and tyrosine. Based on sequence analysis, all but two yeast 

25 protein kinases belong to the Ser/Thr family of protein kinases; however, at the time of the 
study, seven protein kinases (Mpsl, Rad53, Swel, Ime2, Ste7, Hrr25, and Mckl) were 
reported to be dual specificity kinases^'. We confirmed that Swel, Mpsl, Ime2, and Hrr25 
readily phosphorylate poly(Tyr-Glu), but we did not detect any tyrosine kinase activity for 
Ste7, Rad53 or Mckl. Mckl did not show strong activity in any of our assays; however, 

30 Ste7 and Rad53 are very active in other assays. Thus, their inability to phosphorylate 

poly(Tyr-Glu) indicates that they are either very weak tyrosine kinases in general or at least 
are weak with the poly(Tyr-Glu) substrate. Consistent with the latter possibility, others 
have found that poly(Tyr-Glu) is a very poor substrate for Rad53 (Ref 19; D. Stem, pers. 
comm.). Interestingly, we found that 23 other kinases also efficiently use poly(Tyr-Glu) as 

3S a substrate, indicating that there are at least 27 kinases in yeast that are capable of acting in 
vitro as dual specificity kinases. One of these, Riml, was recently shown to phosphorylate 
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a Tyr residue on its in vivo substrate, Ime2, indicating that it is a bona fide dual specificity 
kinase^". In summary, this experiment roughly tripled the number of kinases capable of 
acting as dual specificity kinases, and has raised questions about some of those classified as 
such kinases. 

5 

5. Correlation Between Functional Speciflcity and Amino 
Sequences of the PoIy(Tyr-Glu) Kinases 

The large-scale analysis of yeast protein kinases allows us to compare the functional 
10 relationship of the protein kinases to one another. We found that many of the kinases that 
phosphorylate poly(Tyr-Glu) are related to one another in their amino acid sequences: 70% 
of the poly(Tyr-Glu) kinases cluster into distinct four groups on a dendrogram in which the 
kinases are organized relative to one another based on sequence similarity of their 
conserved protein kinase domains (Figure 5a). Further examination of the amino acid 
15 sequence reveals four types of amino acids that are preferentially found in the poly(Tyr- 
Glu) class of kinases relative to the kinases that do not use poly(Tyr-Glu) as a substrate 
(three are lysines and one is a methionine); one residue (an asparagine) was preferentially 
located in the kinases that do not readily use poly(Tyr-Glu) as a substrate (Figure 5b). Most 
of the residues lie near the catalytic portion of the molecule (Figure 5bf\ suggesting that 
20 they may play a role in substrate recognition. 

D. Discussion 

1. Large-scale Analysis of Protein Kinases 

25 

This study employed a novel protein chip technology to characterize the activities of 
1 19 protein kinases for 17 different substrates. We found that particular proteins are 
preferred substrates for particular protein kinases, and vice versa, many protein kinases 
prefer particular substrates. One concern with these studies is that it is possible that kinases 

30 other than the desired enzyme are contaminating our preparations. Although this cannot be 
rigorously ruled out, analysis of five of our samples by Coomasie staining and immunoblot 
staining with anti-GST does not reveal any detectable bands in our preparation that are not 
GST fusions (see methods). 

It is important to note that in vitro assays do not ensure that a substrate for a 

35 particular kinase in vitro is phosphorylated by the same kinase in vivo. Instead, these 
experiments indicate that certain proteins are capable of serving as substrates for specific 
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kinases, thereby allowing further analysis. In this respect, these assays are analogous to 
two-hybrid studies in which candidate interactions are detected. Further experimentation is 
necessary to determine if the processes normally occur in vivo. 

Consistent with the idea that many of the substrates are likely to be bonafide 
5 substrates in vivo is the observation that three kinases, Hrr25, Pbs2 and Mekl, 

phosphorylate their known substrates in our assays. Furthermore, many of the kinases 
(eg.,Ste20) co-localize with their in vitro substrates (e.g., Axl2). Thus, we expect many of 
the kinases that phosphorylate substrates in our in vitro assays are likely also to do so in 
vivo. 

10 Although most of the kinases were active in our assays, several were not. 

Presumably, our preparations of these latter kinases either lack sufficient quantities of an 
activator or were not purified under activating conditions. For example, Cdc28 which was 
not active in our assays, might be lacking its activating cyclins. For the case of Hogl, cells 
were treated with high salt to activate the enzyme. Since nearly all of our kinase 

15 preparations did exhibit activity, we presume that at least some of the enzyme in the 

preparation has been properly activated and/or contains the necessary cofactors. It is likely 
that the overexpression of these enzymes in their native organism contributes significantly 
to the high success of obtaining active enzymes. 

Using the assays on the protein chip, many kinases that utilize poly(Tyr-Glu) were 

20 identified. The large-scale analysis of many kinases allowed the novel approach of 
correlating functional specificity of poly(Tyr-Glu) kinases with specific amino acid 
sequences. Many of the residues of the kinases that phosphorylate poly(Tyr-Glu) contain 
basic residues. This might be expected if there were electrostatic interactions between the 
kinases residues and the Glu residues. However, the roles of some of the other residues are 

25 not obvious such as the Met residues on the kinases that phosphorylate poly(Tyr-Glu) and 
the Asn on those that do not. These kinase residues may confer substrate specificity by 
other mechanisms. Regardless, analysis of additional substrates should allow further 
correlation of functional specificity with protein kinase sequence for all protein kinases. 

30 2. Protein chip technology 

In addition to the rapid analysis of large nimiber of samples, the protein chip 
technology described here has significant advantages over conventional methods. 1) The 
chip-based assays have high signal-to-noise ratios. We found that the signal-to-noise ratio 
35 exhibited using the protein chips is much better (>10 fold) than that observed for traditional 
microtiter dish assays (data not shown). Presumably this is due to the fact that ^^P-y-ATP 
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does not bind the PDMS as much as microtiter dishes. 2) The amount of material needed is 
veiy small Reactions volumes are 1/20-1/40 the amount used in the 384-well microtiter 
dishes; less than 20 ng of protein kinase was used in each reaction. 3) The enzymatic assays 
using protein chips are extremely sensitive. Even though only 105 fusions were detectable 

5 by immunoblot analysis, 112 exhibited enzymatic activity greater than five-fold over 
background for at least one substrate. For example, Mpsl consistently exhibits the 
strongest activity in many of the kinase assays even though we have not been able to detect 
this fusion protein by immimoblot analysis (see Figures lb and 3a). 4) Finally, the chips are 
inexpensive; the material costs less than eight cents for each array. The microfabricated 

10 molds are also easy to make and inexpensive. 

In addition to the analysis of protein kinases, this protein chip technology is also 
applicable to a wide variety of additional assays, such as ATP and GTP binding assays, 
nuclease assays, helicase assays and protein-protein interaction assays. Recently, in an 
independent study, Phizicky and coworkers expressed yeast proteins as GST fusions under 

1 5 the much weaker CUPl promoter*. Although the quality of their clones has not been 
established, they were able to identify biochemical activities using pools of yeast strains 
containing the fusion proteins. The advantage of our protein chip approach is that all 
samples can be analyzed in a single experiment. Furthermore, although this study used 
wells which have the advantage of segregating samples, flat PDMS chips and glass slides 

20 can also be used for different assays; these have the advantage that they can be used with 
standard pinning tool microarrayers. This technology can also be applied to facilitate high- 
throughput drug screening in which one can screen for compounds that inhibit or activate 
enzymatic activities of any gene products of interest. Since these assays will be carried out 
at the protein level, the results will be more direct and meaningful to the molecular function 

25 of the protein. 

We configwed the protein chip technology for a specific protein kinase assay using 
commonly available sample handling and recording equipment. For this purpose, array 
dimensions remained relatively large compared to dimensions readily available with 
microfabricated silicone elastomer structures'^. We have cast PDMS structures with feature 
30 sizes two orders of magnitude smaller than those reported here using microlithographically 
fabricated molds, while others have reported submicron feature sizes in microfabricated 
structures". These results indicate that well densities of microfabricated protein chips can 
be readily increased by several orders of magnitude. The protein chip technology reported 
here is readily scalable. 

35 In conclusion, an inexpensive, disposable protein chip technology was developed for 

high throughput screening of protein biochemical activity. Utility was demonstrated 
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through the analysis of 1 19 protein kinases from Saccharomyces cerevisiae assayed for 
phosphorylation of 17 different substrates. These protein chips permit the simultaneous 
measurement of hundreds of protein samples. The use of micro fabricated arrays of wells as 
the basis of the chip technology allows array densities to be readily increased by several 
orders of magnitude. With the development of appropriate sample handling and 
measurement techniques, these protein chips can be adapted for the simultaneous assay of 
several thousand to millions of samples. 
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VII, Example II; Analysis of Yeast Protein Kinase Activity Using Protein Chios 

20 

A. Introduction 

The following example presents three protocols that, for illustration purposes only, 
provide different methods of using the protein chips of the present invention to assay for 
25 protein kinase activity. 

1. Assay Methods for Protein Kinase Activity 

i. Autophosphorvlation Activity 
30 (1) Protein chips were washed three times with 100% EtOH at room 

temperature. The chips were then coated with the linker GPTS (1% in 95% EtOH) at room 
temperature for one hour with shaking. After washing with 100% EtOH three times, the 
chips were dried at 130**C for 1.5 hours under vacuum. 

(2) GST::yeast protein kinases, one kinase species per well, were bound 
35 to the wells of the protein chip by incubation for at least one hour. The chip was further 
blocked by 1%BSA. 
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(3) Kinase buffer and a ^^P-y-ATP probe was added to each well, and 
incubated at 30**C for 30 minutes. The chip was washed extensively after the 
phosphorylation reaction was completed. 

(4) The specific "P-y-ATP signal, representing autophosphorylation, was 
5 detected and quantified by a phosphoimager. 



ii. Kinase Activity > Protocol I 

(1) Protein chips were washed three times with 100% EtOH at room 
temperature. The chips were then coated with the linker GPTS (1% in 95% EtOH) at room 

10 temperature for one hour with shaking. After washing with 100% EtOH three times, the 
chips were dried at 130°C for 1.5 hours under vacuum. 

(2) A substrate (for example, GST::yeast protein) was bound to the chips 
by incubation for one or more hours. The chip was fiirther blocked by 1 % BS A, and the 
chip was washed. 

15 (3) A different protein kinase was added to each well of the protein chip, 

along with kinase buffer and "P-y-ATP, and incubated at 30^*0 for 30 minutes. The protein 
chip was washed extensively after the phosphorylation reaction was completed. 

(4) The specific ^^P-y-ATP signal, representing phosphorylation of the 
substrate protein by the protein kinase probe, was detected and quantified by a 
phosphoimager. 



iii. Kinase activity - Protocol 11 

(1) Protein chips were washed three times with 100% EtOH at room 
temperature. The chips were then coated with the linker GPTS (1% in 95% EtOH) at room 
temperature for one hour with shaking. After washing with 100% EtOH three times, the 
chips were dried at 130**C for 1.5 hours under vacuum. 

(2) A substrate (for example, GST::yeast protein) was bound to the chips 
by incubation for one or more hours. The chip was fiuther blocked by 1 % BS A and the 
chip was washed. 

(3) A different protein kinase was added to each well of the protein chip, 
along with kinase buffer and P-y-ATP, and incubated at 30°C for 30 minutes. The protein 
chip was washed extensively after the phosphorylation reaction was completed. The chip 
was incubated with iodoacetyl-LC-biotin in the dark at room temperature overnight. 

(4) After washing, the chip was probed with fluorescent-labeled avidin to 
detect the phosphorylation signals. 
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(5) The chip was then scanned using an Axon Genepix 4000A scanner, 
which was modified with a lens having an increased depth of focus of about 300-400 
microns. The modifications allow scanning of surfaces mounted on a slide {e.g., the PDMS 
microarrays of the present invention), which would otherwise be out of the plane of focus. 
5 Using the modified Axon Genepix 4000A scanner, the arrays were scanned to acquire and 
quantify fluorescent signals. 

VIIL Example III: Analysis of Protein-Protein Interactions Using Protein Chips 

10 

A protein of interest ("probe protein*') is recombinantly expressed in and purified 
fi^om E. coli as a labeled fusion protein using standard protocols. The target proteins are 
attached to the wells of the chip, with a different target protein in each well. The purified 
probe protein is introduced into each well of the chip, and incubated for several hours or 

1 5 more. The chip is washed and probed with either: a) antibodies to the probe protein, or b) 
antibodies to the label on the fusion protein. The antibodies are labeled with a fluorescent 
label, such as Cy3 or Cy5, or are detected using a fluorescently labeled secondary antibody 
that detects the first antibody. 

The following examples provide, for illustration purposes only, methods of using 

20 the protein chips of the present invention to assay for proteases, nucleases, or G-protein 
receptors. Protein-protein interactions generally can be assayed using the following or a 
similar method. 

A. Analysis of Protease Activity 

25 

Protease activity is assayed in the following way. First, protein probes are prepared 
consisting of various combinations of amino acids, with a C-terminal or N-terminal mass 
spectroscopic label attached, with the only proviso being that the molecular weight of the 
label should be sufficiently large so that all labeled cleavage products of the protein can be 

30 detected. The protein probe is contacted with proteases attached to a protein chip at 37°C. 
After incubation at 37°C for an appropriate period of time, and washing with acetonitrile 
and trifluoroacetic acid, protease activity is measured by detecting the proteolytic products 
using mass spectrometry. This assay provides information regarding both the proteolytic 
activity and specificity of the proteases attached to the protein chip. 

35 Another rapid assay for protease activity analysis is to attach proteins of known 

sequence to the chip. The substrate proteins are fluorescently labeled at the end not 
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attached to the chip. Upon incubation with the protease(s) of interest, the fluorescent label 
is lost upon proteolysis, such that decreases in fluorescence indicate the presence and extent 
of protease activity. This same type of assay can be carried out wherein the protein 
substrates are attached to beads placed in the wells of the chips. 

5 

B. Analysis of Nuclease Activity 

Nuclease activity is assessed in the same manner as described for protease activity, 
above, except that nucleic acid probes/substrates are substituted for protein 
10 probes/substrates. As such, fluorescently tagged nucleic acid fragments that are released by 
nuclease activity can be detected by fluorescence, or the nucleic acid fragments can be 
detected directly by mass spectrometry. 

C. Analysis of G-Protein Coupled Receptors 

15 

In another type of assay, compounds that bind G-protein coupled receptors are 
identifled. Initially, the G-protein receptor is cloned as a GST fusion protein, with the GST 
portion attached to the C terminus of the G-protein because the C-terminus is generally not 
involved with determining probe specificity. The G-protein:: GST fusion proteins are 
20 attached to the wells, preferably by association with glutathione. The G-protein receptors 
are then incubated with a mixture of compounds, such as a combinatorial chemical library 
or a peptide library. After washing, bound probes are eluted, for example by the addition of 
25% acetonitrile/0.05% trichloroacetic acid. The eluted material is then be loaded into a 
MALDI mass spectrometer and the nature of the bound probes identified. 

25 

IX. Example IV: Analysis of Protein Kinases Inhibition bv Specific Inhibitors 
Using Protein Chips 

30 The following description provides, for exemplary purposes only, methods of using 

the protein chips of the present invention to examine protein kinases for sensitivity to 
protein kinase inhibitors. Protein-protein interactions generally can be assayed using the 
following or similar method. 

Substrates were bound to the surface of the GPTS-treated microwells on the protein 

35 chip at room temperature for one hour, then blocked with 1% BSA and 100 mM Tris pH 
7.5, and washed three times with TBS buffer. Kinases and different concentrations of 
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kinase inhibitors were added to the microwells in the presence of ^^Py-ATP. The 
phosphorylation reaction was carried out at 30°C for thirty minutes. After completion of the 
reaction, the protein chip was washed extensively with TBS buffer at room temperature, and 
then allowed to dry. Phosphorylation signals were obtained by exposing the protein chip to 

5 either X-ray film or a phosphoimager. 

A human protein kinase A (PKA), a human map kinase (MAPK), three yeast PKA 
homologs (TPKl, TPK2 and TPK3), and two other yeast protein kinases (HSLl and RCKl) 
were tested against two substrates (/.e., a protein substrate for PKA and a conmionly used 
kinase substrate, MBP) using different concentrations of PKIa (a specific human PKA 

10 inhibitor) or SB202 190 (a MAPK inhibitor). As shown in Figure 7, PKIa specifically 
inhibited PKA activities on both peptide and MBP substrates. However, PKIa did not 
inhibit the three yeast PKA homologs (TPKl, TPK2, TPK3) or the other two yeast protein 
kinases tested, HSLl and RCKl). In addition, SB202190 did not inhibit PKA activity. 

X. Example V: Kinase Assays on a Glass Surface 

1 . Glass slides (Fisher, USA) were soaked in 28-30% ammonium hydroxide 
overnight at room temperature ("RT") with shaking. 

2. The sUdes were rinsed with ultra-pure water four times for 5 minutes ("min") 
each, then rinsed with a large volume of 100% ethanol ("EtOH'*) to completely remove the 

2^ water. Slides were then rinsed with 95% ethanol three times. 

3. The slides were immersed in 1% 3-glycidoxypropyltrimethoxysilane (GPST) 
solution in 95% EtOH, 16 mM acetic acid ("HO Ac**) with shaking for 1 hr at room 
temperature. The slides were rinsed with 95% ethanol three times at RT. 

4. The slides were cured at 135^*0 for 2 hrs under vacuum. After cooling, the 
slides can be stored in Argon for months before use. 

5. Approximately 10 \xl of each protein substrate (in 40% glycerol) were 
arrayed onto a 96-well PGR plate on ice. A manual spotting device (V&P Scientific, USA) 
was used to spot approximately 3 nl of each of the samples onto the GPTS-treated glass 
slide at RT. In one embodiment, 768 samples are spotted on a single slide. The slides were 
incubated in a covered and clean chamber at RT for one hour. 

6. A shde was blocked with 10 ml blocking buffer (100 mM glycine, 100 mM 
Tris, pH 8.0, 50 mM NaCl) at RT for one hour. The slides were washed with TBS buffer 
(50 mM Tris, pH 8.0, 150 mM NaCl) three times and spun to dryness at 1500 rpm for 5 
min. 
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7. The substrate surfaces on the slides were covered with the HybriWell 
SeaHng System (Schleicher & Schuell, Germany) and 40 ^1 of kinase mixture, containing a 
protein kinase and -^^P-y-ATP as a labeling reagent, was added to the substrates on ice. 

8. The reaction was incubated at 30**C for 30 min in a humidity chamber. The 
seals were peeled from the slides, and the slides immersed into large volume of PBS buffer 
containing 50 mM EDTA. The slides were further washed with the same buffer 3x15 min 
at RT. The washed slides were then dried with Kimwipes. 

9. To acquire the signals, the slides were exposed to a Phosphoimager screen 
and the data analyzed using ImageQuant software. 



10 



XI. References Cited 

All references cited herein are incorporated herein by reference in their entirety and 
15 for all purposes to the same extent as if each individual publication or patent or patent 
"5 appUcation was specifically and individually indicated to be incorporated by reference in its 

"•M entirety for all purposes. 

Many modifications and variations of this invention can be made without departing 
E from its spirit and scope, as will be apparent to those skilled in the art. The specific 

j-^ 20 embodiments described herein are offered by way of example only, and the invention is to 
□ be limited only by the terms of the appended claims, along with the full scope of 

equivalents to which such claims are entitled. 

M 

25 



30 



35 



-46- 



NY2- 1102584.9 



We claim: 




3 

□ 



20 

8. 
9. 

25 10. 

11. 
12. 

30 13. 

14. 

15. 

35 



positionally addressable/array comprising a plurality of different substances, 
se^cted from the group jronsisting of proteins, molecules comprising functional 
domains of said prote^s, whole cells, and protein-containing cellular material, on a 
solid support, with pch different substance being at a different position on the solid 
support\vherein me plm-ality of different substances consists of at least 100 
different ^ubstary/es per cm^. 

The array of cldim 1 wherein the plurality of different substances consists of 
between 100\and 1,000 different substances per cm^. 

The array of daim 1 wherein the plurality of different substances consists of 

between 1,000 Wd 10,000 different substances per cm^. 

The array of claim 1 wherein the plurality of different substances consists of 

between 10,000 and 100,000 different substances per cm^. 

The array of claim 1 wherein the plurality of different substances consists of 

between 100,000 ana 1,000,000 different substances per cm^. 

The array of claim 1 w^herein the plurality of different substances consists of 

between 1,000,000 andXl 0,000,000 different substances per cm^. 

The array of claim 1 whetein the plurality of different substances consists of 

between 10,000,000 and 22LOO0,0OO different substances per cm^ 

The array of claim 1 wherein the plurality of different substances consists of at least 

25,000,000 different substanc\(S per cml 

The array of claim 1 wherein tWg plurality of different substances consists of at least 
10,000,000,000 different substances per cml 

The array of claim 1 wherein the pftirality of different substances consists of at least 
10,000,000,000,000 different substartces per cm^ 
The array of claim 1 wherein the solid\upport is a glass slide. 
The array of claim 1 wherein each differ^t substance is present in a different well 
on the surface of the solid support. 

The array of claim 12 wherein each differen^gubstance in a different well is bound 
to the surface of the solid support. 

The an^ay of claim 12 wherein each different suftgtance in a different well is not 
boimd to the surface of the solid support. 

The array of claim 12 wherein each different substance in a different well is in 
solution. 
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16. The array of claim 12 wher^n each well contains reagents for assaying biological 
activity of a protein or molecrde. 

17. A positionally addressable array comprising plurality of different proteins, or 
molecules comprising functional domains o/said proteins, on a solid support, with 

5 each different protein or molecule being aya different position on the solid support, 

wherein the plurality of proteins or molecules consists of at least 50% of all 
expressed proteins with the same type biological activity in the genome of an 
organism. 

1 8. The array of claim 17 wherein the pjiu-ality of proteins or molecules consists of at 
10 least 75% of all expressed proteins/vith the same type of biological activity in the 

genome of an organism. 

19. The array of claim 17 wherein ttfe plurality of proteins or molecules consists of at 
least 90% of all expressed prot/ins with the same type of biological activity in the 
genome of an organism. 

15 20. The array of claim 17 ^H^ri^iithe organism is selected from the group consisting of 
bacteria, yeast, insects, ate^mairanals. 

21 . The array of claim ifl Y^hpremJ^e expressed proteins with a biological activity of 
interest are selectedlfifcrtUlIe group consisting of kinases, phosphatases, proteases, 
glycosidases, acetylases, other group transferring enzymes, nucleic acid binding 

20 proteins, hormone biming proteins, and DNA binding proteins. 

22. A positionally addressable array comprising a pliwality of different substances 
selected from the gpup consisting of proteins, molecules comprising functional 
domains of said proteins, whole cells, and protein-containing cellular material, on a 
solid support, wim each different substance being at a different position on the solid 

25 support, wherein the solid support is selected from the group consisting of ceramics, 

amorphous silic/on carbide, castable oxides, polyimides, polymethylmethacrylates, 
polystyrenes and silicone elastomers. 

23. The array of claim 22 wherein the solid support is silicone elastomer. 

24. The array oudaim 23 wherein the solid support is polydimethylsiloxane. 

30 25. A positionajily addressable array comprising a plurality of different substances, 
selected fr6m the group consisting of proteins, molecules comprising functional 
domains of said proteins, whole cells, and protein-containing cellular material, on a 
solid support, with each different substance being at a different position on the solid 
support, Avherein the pliu-ality of different substances are attached to the solid 
35 supporuvia a 3-glycidooxypropyltrimethoxysilane linker. 
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26. 
27. 
5 28. 
29. 
30. 

10 

31. 
32. 
15 33. 
34. 
35. 



20 



36. 



25 37. 
38. 
39. 



30 



40. 
41. 

42. 



35 



An array comprising a plurality of wells on the sprface of a solid support wherein 
the density of the wells is at least 100 wells/cm^ 

The array of claim 26 wherein the density of tjie wells is between 100 and 1,000 
wells/cm^. 

The array of claim 26 wherein the density of the wells is between 1,000 and 10,000 
wells/cm^. 

The array of claim 26 wherein the density/of the wells is between 10,000 and. 
100,000 wells/cm^. 

The array of claim 26 wherein the densj^ty of the wells is between 100,000 and 
1,000,000 wells/cml 

The array of claim 26 wherein the dejjfsity of the wells is between 1 ,000,000 and 
10,000,000 wells/cml 

The array of claim 26 wheyeiR>fhe j(ensity of the wells is between 10,000,000 and 
25,000,000 wells/cm' 
The array of claim 2^^vhjbreil 
wells/cm^. 

The array of claim f,6 
wells/cm^. 

The array of claim 13^5 wher^ 
wells/cm^. 

The array of claim 26 wh|^rein a plurality of different substances, selected from the 
group consisting of protoftns, molecules comprising functional domains of said 
proteins, whole cells, and protein-containing cellular material, is present in the 
wells, with each different substance being present in a different well. 
The array of claim 36r wherein each different substance in a different well is bound 
to the surface of the solid support. 

The array of claim 37 wherein each different substance in a different well is 
covalently bound jo the surface of the solid support. 

The array of claim 38 wherein each different substance in a different well is 
covalently bourm to the surface of the solid support through a linker. 
The array of clpm 39 wherein the linker is 3-glycidooxypropyltrimethoxysilane. 
The array of c/aiin 36 wherein each different substance in a different well is non- 
covalently bdund to the surface of the solid support. 

The array ofilclaim 36 wherein each different substance in a different well is free of 
binding to me surface of the solid support. 



I density of the wells is at least 25,000,000 
/the d^sity of the wells is at least 10,000,000,000 
le density of the wells is at least 10,000,000,000,000 
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43. The array of claim 36 wherein each different substance in a different well is in 
solution, / 

44. The array of claim 26 wherein each well contains reagents for assaying biological 
activity of a protein or molecule. / 

5 " 45. The array of claim 26 wherein volumes of tM wells are between 1 pi and 5 jil. 

46. The array of claim 26 wherein volumes of the wells are between 1 nl and 1 ^il. 

47. The array of claim 26 wherein volumes ofythe wells are between 100 nl and 300 nL 

48. The array of claim 26 wherein the bottoms of the wells are square, round, V-shaped 
or U-shaped. / 

10 49. A method of making a positionally adcyessable array comprising a plurality of wells 
on the surface of a solid support commsing the step of: 

casting an array frpxa^ microlabricated mold designed to produce a density 
of wells on a Mflid surftce of greater than 100 wells/cm^. 

50. A method of making a positionally addressable array comprising a plurality of wells 
15 on the surface of ^soUd aipoort^^ the steps of: 

(a) casting a M)rond^[y>nold from a microfabricated mold designed to 
produce a density of wells on a^lid ^facelof greater than 100 wells/cm^; and 

(b) / casting m lease one array from the secondary mold. 

5 1 . The method of claims 49 oy Sp^werein the casting of an array further comprises the 
20 steps of: \ — / 

(a) co^ring me mold with a liquid cast material; and 

(b) curing the cast material until the cast is solid. 

52. The method of any of cflaims 49-51 wherein the density of the wells is between 100 
and 1,000 wells/cml/ 

25 53 . The method of any of claims 49-5 1 wherein the density of the wells is between 
1 ,000 and 1 0,000 yells/cml 

54. The method of an/ of claims 49-5 1 wherein the density of the wells is between 
10,000 and 100,0(00 wells/cml 

55. The method of any of claims 49-5 1 wherein the density of the wells is between 
30 1 00,000 and 1 yDOO,000 wells/cml 

56. The method of any of claims 49-5 1 wherein the density of the wells is between 
1,000,000 sJd 10,000,000 wells/cm\ 

57. The method of any of claims 49-5 1 wherein the density of the wells is between 
10,000,00fi and 25,000,000 wells/cml 

35 58. The method of any of claims 49-5 1 wherein the density of the wells is greater than 
25,000,000 wells/cm^ 
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The method of any of claims 49-51 wherein thef density of the wells is greater than 
1 0,000,000,000 wells/cml / 

The method of any of claims 49-5 1 wherein tjSe density of the wells is greater than 
10,000,000,000,000 wells/cml / 

The method of claim 49 or 50 wherein the afrray is cast from silicone elastomer. 
The method of claim 49 or 50 wherein theyarray is cast from polydimethylsiloxane. 
The method of claim 51 wherein the liquLfl cast material is a silicone elastomer. 
The method of claim 51 wherein the liquid cast material is polydimethylsiloxane. 
A method of using a positionally addres/sable array comprising a plurality of 
different substances, selected from the^oup consisting of proteins, molecules 
comprising functional dpffialhs of said proteins, whole cells, and protein-containing 
cellular material, on stolid support,Avith each different substance being at a 
different position m the soli d ^pp 6rt, wherein the plurality of different substances 
consists of at leas/ 100 d^mn^t suo^ances per cm^, comprising the steps of: 

(a) contactingi^robe wim the array; and 

(b) detecting OTOtein^rpoe interaction. 

A method of using a pq^^i253)*^^dressable array comprising a plurality of 
different proteins^or moleculp comprising functional domains of said proteins, on a 
solid support, with each different protein or molecule being at a different position on 
the solid support, wherein tjSe plwality of proteins or molecules consists of at least 
50% of all expressed prote/ns with the same type of biological activity in the 
genome of an organism, oomprising the steps of: 

(a) contacting a probe with the array; and 

(b) detecting protein/probe interaction. 

A method of using a ptfsitionally addressable array comprising a plurality of 
different substances, selected from the group consisting of proteins, molecules 
comprising functional domains of said proteins, whole cells, and protein-containing 
cellular material, onra solid support, with each different substance being at a 
different position an the sohd support, wherein the solid support is selected from the 
group consisting of ceramics, amorphous silicon carbide, castable oxides, 
polyimides, poljpiethylmethacrylates, polystyrenes and silicone elastomers, 
comprising the £teps of: 

(a) ^contacting a probe with the array; and 

(b) / detecting protein/probe interaction. 

A method of Lsing a positionally addressable array comprising a plurality of 
different sul/stances, selected from the group consisting of proteins, molecules 
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3in detection of protein/probe interaction is 



comprising functional domains of said proteins, whole cells, and protein-containing 
cellular material, on a solid support, with each different substance being at a 
different position on the solid support, wheyein the plurality of different substances 
are attached to the solid support via a 3-gl)/cidooxypropyltrimethoxysilane linker, 
comprising the steps of: 

(a) contacting a probe with thb array; and 

(b) detecting protein/probe interaction. 
The method of any of claims 65-68 wherein the probe is an enzyme substrate or 
inhibitor. 

The method of claim 69 wherein the ^obe is a substrate or inhibitor of an enzyme 
chosen from the group consisting of kinases, phosphatases, proteases, glycosidases, 
acetylases, and other group transfermig enzymes. 

The method of any of claims e^^^ffwherein the probe is chosen from the group 
consisting of proteins, oligoifucleatfdes, polynucleotides, DNA, RNA, small 
molecule substrates, drugXMdidatss, receptors, antigens, steroids, phospholipids, 
antibodies, glutathion^inifiunogjlobulin domains, maltose, nickel, dihydrotrypsin, 
and biotin. 
The method of any < 
via mass spectrome 

The method of any pf claims 65-68 wlierein detection of protein/probe interaction is 
via a method chosefraom the grrap consisting of chemiluminescence, fluorescence, 
radiolabeling, and atomic foiSs^microscopy. 

A method of using a positionally addressable array comprising the steps of: 

(a) depositing aiplurality of different substances, selected from the group 
consisting (m proteins, molecules comprising functional domains of 
said proteins, whole cells, and protein-containing cellular material, on 
a solid support, with each different substance being at a different 
position on the solid support, wherein the plurality of different 
substances consists of at least 100 different substances per cm^; 

(b) contacting a probe with the array; and 

(c) detectiiLg protein/probe interaction. 

The method of claim 14 wherein the solid support is a glass slide. 

A method of using a rfositionally addressable array comprising the steps of: 

(a) depositing a pluraUty of different proteins, or molecules comprising 
funct/onal domains of said proteins, on a solid support, with each 
different protein or molecule being at a different position on the solid 
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support, wherein the plurality of proteins or molecules consists of at 
least 50% of all expressed proteiny with the same type of biological 
activity in the genome of an orgapism; 
(b) contacting a probe with the array; and 
5 (c) detecting protein/probe interact/on. 

77. The method of claim 76 wherein the solid sumort is a glass slide, 

78. A method of making a positionally addressattle array comprising the steps of: 

(a) casting an array from a microfabricated mold designed to produce a 
density of wells on a solid surface of greater than 100 wells/cm^; and 

10 (b) depositing in wells a plurality of different substances, selected from 

the group consisting of proteins, molecules comprising fimctional 
domains of s^idproteina whole cells, and protein-containing cellular 
material, im a somd support, with each different substance being in a 
diflFerem well on/the s*lid support. 

1 5 79. A method of makingA posftiofolly/ddressable array comprising the steps of: 

(a) casting a s^fci^daryym^ from a microfabricated mold designed to 
prodjuce a pensity ffv/^ls on a solid siirface of greater than 100 
well 

(b) castmg/dt least-<jfie array from the secondary mold; and 
20 (c) depositing in wells a plurality of different substances, selected from 

the group consisting of proteins, molecules comprising functional 
domains of sdld proteins, whole cells, and protein-containing cellular 
material, onya solid support, with each different substance being in a 
different well on the solid support. 
25 80. A method of identifying antigen that activates a lymphocyte comprising the steps 
of: 

(a) contactii/g a positionally addressable array with a plurality of 
lymphocytes, said array comprising a plurality of potential antigens 
on a solid support, with each different antigen being at a different 

30 position on the solid support, wherein the density of different 

antigens is at least 100 different antigens per cm^; and 

(b) detecting positions on the solid support where lymphocyte activation 
turs. 

81. The method of Olaim 80 wherein the lymphocytes are derived from a patient. 

35 



-53- 



NY2- 1102SS4.9 



p 

m 

in 



Li 



82. 
83. 

5 

84. 
85. 

10 

86. 
87. 
15 88. 



20 



25 



89. 
90. 
91. 



30 



92. 



35 



The method of Claim 80 wherein the antigens are selected from the group consisting 
of antigens of pathogens, antigens of autologous tissues, tissue-specific antigens, 
disease-specific antigens, and synthetic antigens. 

The method of Claim 80 wherein lymphocyje activation is detected by measuring 
antibody synthesis. 

The method of Claim 80 wherein lympho^e activation is detected by measuring 
the incorporation of ^H-thymidine by a Iwiphocyte. ... 
The method of Claim 80 wherein lymphocyte activation is detected by determining 
the expression of cell surface molecul^ induced or suppressed by lymphocyte 
activation. 

The method of Claim 80 wherein lyifiphocyte activation is detected by determining 
the expression of secreted molecules induced by lymphocyte activation. 
The method of Claim 80 wherein lymphocyte activation is detected by measuring 
the release of "chromixi 



A method of deteim 
steps of: 
(a) 



; thef snecificity of an antibody preparation comprising the 



con^U&g a pti^sitioMally addressable array with an antibody 
prejpamtion, said amy comprising a pluraUty of potential antigens on 
a 4ol>a support, vnth each different antigen being at a different 
pdSiTie»-Gfmne soUd support, wherein the density of different 
antigens i^at least 100 different antigens per cm^; and 



(b) 



detecting positions on the solid support where binding by an antibody 
in said antibody preparation occurs. 
The method of Claim 88 wherein the antibody preparation comprises antiserum, a 
monoclonal antibody Jor a polyclonal antibody. 

The method of Claim/ 88 wherein the antibody preparation comprises Fab fragments, 
chimeric, single chain, hxunanized, or synthetic antibodies. 
The method of Claim 88 wherein antibody binding is detected by contacting the 
array with a fluorescently labeled secondary antibody that binds to antibody in said 
antibody preparatipn; removing unbound secondary antibody; and detecting boimd 
label on the array/ 
A method of identifying a mitogen comprising the steps of: 

(a) contacting a positionally addressable array with a population of cells; 
array comprising a plurality of different substances, selected 
)m the group consisting of proteins, molecules comprising 
Afunctional domains of said proteins, whole cells, and protein- 
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102. 



(b) 



containing cellul^>m£terial, on a solid support, with each different 
substance bein^t a different position on the solid support, wherein 
the densit/df different substances is at least 100 different substances 
per cn^'J 

detectins/W^tions on the solid support where mitogenic activity is 
induced in a cell. 




one 01 mure ai'i'ays compnsing a plurality ofyells on the surface of a 
solid support wherein the density of the^rffs is at least 100 
wells/cni^; and / 

■ nn one oi lu o rc c o ntainers, one ui/niui^ piubts, leageiils , ur utlier 
^ moleoyles. / : 

The kit according to Claim 93 wherein said one or more containers comprise a 
reagent useful for assaying biological activity of a protein. 

ae kit according to Claim 93 wherein said one or more containers comprise a 
reagehts^seful for assaying interactions between a probe and a protein. 
The kit ac^d^Ming to Claim 94 or 95 wherein the reagent is in solution. 
The kit accordmgsto Claim 94 or 95 wherein the reagent is in solid form. 
The kit according tool^im 94 or 95 wherein the reagent is contained in each well of 
the array. 

The kit according to Claim 94 oh^ wherein the reagent is contained in selected 
wells of the array. 

The kit according to (2Iaim 93 wherein sai5 one or more containers contain a 
solution reaction Bfuxture for assaying biological activity of a protein or molecule. 
The kit according to Claim 100 wherein said one or more containers contain one or 
more substrates to assay said biological activity. 

A kit comprising: 

' addressable array^omprising a plurality of 
; group consisting of proteins, 
^domains of said proteins, whole 
; cellular material, on a solid support, 



(a) 



one or more posi; 
different subs 
molecules o 
cells, and 



(b) 




with each 
support, 
least 100 
in one o 



ice being at a different position on the solid 
lity of different substances consists of at 
substances per cm^; and 
lore containers, one or more probes, reagents, or other 
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J. The kit according to Claim 102 v^herein the substances are attached to the surface of 
wells on the solid support. / 

The kit according to Claim 103 wherein the substances are proteins, and the proteins 
are at least 50% of all ckpyts^ proteins with the same type of biological activity in 
an organism. 

wherein the substances are proteins or molecules 
s of said proteins, and the proteins or molecules are 
insisting of kinases, phosphatases, proteases, glycosidases, 
binding proteins, and hormone binding proteins. 



The kit according to 
comprising functioi 
selected from the 
acetylases, nucleic 
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ABSTRACT 



The present invention relates to protein chips useful for the large-scale study of 
protein function where the chip contains densely packed reaction wells. The invention also 

S relates to methods of using protein chips to assay simultaneously the presence, amount, 
and/or function of proteins present in a protein sample or on one protein chip, or to assay 
the presence, relative specificity, and binding affinity of each probe in a mixture of probes 
for each of the proteins on the chip. The invention also relates to methods of using the 
protein chips for high density and small volume chemical reactions. Also, the invention 

10 relates to polymers useful as protein chip substrates and methods of making protein chips. 
The invention further relates to compounds useful for the derivatization of protein chip 
substrates. 
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_ 3 - Snyder et al 

09/849,781 

Amendments to the Claims 

This listing of claims will replace all prior versions, and listings of claims in the 

application. 

Claim 1 . (Currently amended) A positionally addressable array comprising 
a plurality of different substances on a sohd support, with each different substance being 
at a different position on the solid support, wherein the density of the different 
substances on the solid support is at least 100 different substances per cm^ and wherein 
the plurality of different substances comprises at least 61 purified active kinases or 
functional kinase domains thereof of a mammal, 61 purified active kinases or functional 
kinase domains thereof of a yeast, or 61 purified active kinases or functional kinase 
domains thereof of a Drosophila. 

Claim 2. (Previously presented) The array of claim 1 wherein the density of 
the different substances on tlie array is between 100 and 1,000 different substances 
per cm^. 

Claim 3. (Previously presented) The array of claim 1 wherein the density of 
the different substances on the array is between 1,000 and 10,000 different substances 
per cm^. 

Claim 4. (Previously presented) The array of claim 1 wherein the density of 
the different substances on the array is between 10,000 and 100,000 different substances 
per cm^. 
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Claim 5. (Previously presented) The airay of claim 1 wherein the density of 
the different substances on the array is between 100,000 and 1,000,000 different 
substances per cm^. 

Claim 6. (Previously presented) The array of claim 1 wherein the density of 
the different substances on the array is between 1,000,000 and 10,000,000 different 
substances per cm^. 

Claim 7. (Previously presented) The array of claim 1 wherein the density of 
the different substances on the array is between 10,000,000 and 25,000,000 different 
substances per cm^. 

Claim 8. (Previously presented) The array of claim 1 wherein the density of 
the different substances on the array is at least 25,000,000 different substances per cm\ 

Claim 9. (Previously presented) The array of claim 1 wherein the density of 
the different substances on the array is at least 10,000,000,000 different substances per 
cm^. 

Claim 10. (Previously presented) The array of claim 1 wherein the density of 
the different substances on the array is at least 10,000,000,000,000 different substances 
per cm^. 

Claim 1 1 . (Original) The array of claim 1 wherein the solid support is a glass 

slide. 

Claim 1 2. (Withdrawn) The array of claim 1 wherein each di fferent substance 
is present in a different well on the surface of the solid support. 

Claim 13. (Withdrawn) The array of claim 12 wherein each different 
substance in a different well is bound to the surface of the solid support. 
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Claim 14. (Withdrawn) The array of claim 12 wherein each different 
substance in a different well is not bound to the surface of the solid support. 

Claim 15. (Withdrawn) The array of claim 12 wherein each different 
substance in a different well is in solution. 

Claim 16. (Withdrawn) The array of claim 12 wherein each well contains 
reagents for assaying biological activity of a protein or molecule. 

Claims 17-92. (Canceled). 

Claim 93. (Withdrawn) A kit comprising: 

(a) one or more arrays of claim 1 comprising a plurality of wells on 
the surface of the solid support wherein the density of the wells is at least 100 wells/cm\ 
wherein each of said different substances is present in a different well; and 

(b) in one or more containers, one or more probes, reagents, or other 
second molecules. • 

Claim 94. (Withdrawn) The kit according to claim 93 wherein said one or 
more containers comprise a reagent useful for assaying biological activity of a protein. 

Claim 95. (Withdrawn) The kit according to claim 93 wherein said one or 
more containers comprise a reagent useful for assaying interactions between a probe and 
a protein. 

Claim 96. (Withdrawn) The kit according to claim 94 or 95 wherein the 
reagent is in solution. 

Claim 97. (Withdrawn) The kit according to claim 94 or 95 wherein the 
reagent is in solid form. 
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Claim 98. (Withdrawn) The kit according to claim 94 or 95 wherein the 
reagent is contained in each well of the array. 

Claim 99. (Withdrawn) The kit according to claim 94 or 95 wherein the 
reagent is contained in selected wells of the array. 

Claim 100. (Withdrawn) The kit according to claim 93 wherein said one or 
more containers contain a solution reaction mixture for assaying biological activity. 

Claim 101. (Withdrawn) The kit according to claim 100 wherein said one or 
more containers contain one or more substrates to assay said biological activity. 

Claims 102-105. (Canceled). 

Claim 106. (Withdrawn) The array of claim 1 wherein the solid support is 
composed of a silicone elastomeric material. 

Claim 107. (Withdrawn) The array of claim 106 wherein the silicone 
elastomeric material is polydimethylsiloxane. 

Claims 1 08 to 1 1 1 . (Canceled). 

Claim 112. (Withdrawn) The kit of claim 93 wherein the soHd support is 
selected from the group consisting of a ceramic, amorphous silicon carbide, castable 
oxide, polyimide, polymethylmethacrylate, polystyrene, and silicone elastomer. 

Claim 113. (Withdrawn) The kit of claim 112 wherein the solid support is a 
silicone elastomer. 

Claim 114. (Withdrawn) The kit of claim 112 wherein the solid support is a 
polydimethylsiloxane. 
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Claim 1 1 5. (Witlidrawn) The kit of claim 93 wherein the plurality of different 
substances ai*e attached to the solid support via a 3-glycidoxypropyltrimethoxysilane 
linker. 

Claim 1 1 6, (Withdrawn) The kit of claim 93 wherein the density of the wells 
is between 100 and 1,000 wells/cm^. 

Claim 117. (Withdrawn) The kit of claim 93 wherein the density of the wells 
is between 1,000 and 10,000 wells/cm^ 

Claim 118. (Withdrawn) The kit of claim 93 wherein the density of the wells 
is between 10,000 and 100,000 wells/cm^ 

Claim 119. (Withdrawn) The kit of claim 93 wherein the density of the wells 
is between 100,000 and 1,000,000 wells/cml 

Claim 120. (Withdrawn) The kit of claim 93 wherein the density of the wells 
is between 1,000,000 and 10,000,000 wells/cml 

Claim 121 . (Withdrawn) The kit of claim 93 wherein the density of the wells 
is between 10,000,000 and 25,000,000 wells/cm^ 

Claim 122. (Withdrawn) The kit of claim 93 wherein each different substance 
in a different well is bound to the surface of the solid support. 

Claim 123. (Withdrawn) The kit of claim 122 wherein each different 
substance in a different well is covalently bound to the surface of the solid support. 

Claim 124. (Withdrawn) The kit of claim 123 wherein each different 
substance in a different well is covalently bound to the surface of the solid support 
through a linker. 
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Claim 125. (Withdrawn) The kit of claim 124 wherein the linker is 3- 
glycidoxypropyitrimethoxysilane. 

Claim 126. (Withdrawn) The kit of claim 122 wherein each different 
substance in a different well is non-covalently bound to the surface of the solid support. 

Claim 127. (Withdrawn) The kit of claim 93 wherein each different substance 
in a different well is free of binding to the surface of the solid support. 

Claim 128. (Withdrawn) The kit of claim 93 wherein each different substance 
in a different well is in solution. 

Claim 129. (Withdrawn) The kit of claim 93 wherein each well contains 

reagents for assaying biological activity. 

Claim 1 3 0. (Withdrawn) The kit of claim 93 wherein volumes of the wells are 

between 1 pi and 5|a.l. 

Claim 131. (Withdrawn) The kit of claim 93 wherein volumes of the wells are 

between 1 nl and 1 

Claim 1 32. (Withdrawn) The kit of claim 93 wherein volumes of the wells are 

between 100 nl and 300 nl. 

Claim 133. (WiUidrawn) The kit of claim 93 wherein the bottoms of the wells 

are square, round, V-shaped or U-shaped. 
Claims 134-137. (Canceled). 

Claim 138. (Withdrawn) The array of claim 1 wherein the solid support is 
selected from the group consisting of a ceramic, amorphous silicon carbide, castable 
oxide, polyimide, polymethylmethacrylate, polystyrene, and silicone elastomer. 
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Claim 139. (Withdrawn) The array of claim 1 wherein the solid support is a 
silicone elastomer. 

Claim 140. (Withdrawn) The array of claim 139 wherein the solid support is a 
polydi m et hy Isi lox ane . 

Claim 141. (Previously presented) The array of claim 1 wherein the plurality 
of different substances are attached to the sohd support via a 3-glycidoxypropyl- 
trimethoxysilane linker. 

■ Claim 142. (Withdrawn) The array of claim 12 wherein the density of the 
wells is between 100 and 1,000 wells/cm^. 

Claim 143, (Withdrawn) The array of claim 12 wherein the density of the 
wells is between 1,000 and 10,000 wells/cml 

Claim 144. (Withdrawn) The array of claim 12 wherein the density of the 
wells is between 10,000 and 100,000 wells/cml 

Claim 145. (Withdrawn) The aiTay of claim 12 wherein the density of the 
wells is between 100,000 and 1,000,000 wells/cm^ 

Claim 146. (Withdrawn) The array of claim 12 wherein the density of the 
wells is between 1,000,000 and 10,000,000 wells/cml 

Claim 147. (Withdrawn) The array of claim 12 wherein the density of the 
wells is between 10,000,000 and 25,000,000 wells/cml 

Claim 148. (Withdrawn) The array of claim 12 wherein each different 
substance in a different well is bound to the surface of the solid support. 

Claim 149. (Withdrawn) The aixay of claim 148 wherein each different 
substance in a different well is covalently bound to the surface of the solid support. 
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Claim 150. (Withdrawn) The array of claim 149 wherein each different 
substance in a different well is covalentJy bound to the surface of the soHd support 
through a linker. 

Claim 151. (Withdrawn) The array of claim 150 wherein the linker is 3- 
glycidoxypropyltrimethoxysilane. 

Claim 152. (Withdrawn) The array of claim 148 wherein each different 
substance in a different well is non-covalently bound to the surface of the solid support. 

Claim 153. (Withdrawn) The array of claim 12 wherein each different 
substance in a different well is free of binding to the surface of the solid support. 

Claim 154. (Withdrawn) The array of claim 12 wherein each different 
substance in a different well is in solution. 

Claim 155. (Withdrawn) The array of claim 12 wherein each well contains 
reagents for assaying biological activity. 

Claim 156. (Withdrawn) The array of claim 12 wherein volumes of the wells 

are between 1 pi and 5 

Claim 157. (Withdrawn) The array of claim 12 wherein volumes of the wells 

are between 1 nl and 1 |.il. 

Claim 158. (Withdrawn) The array of claim 12 wherein volumes of the wells 
are between 100 nl and 300 nl. 

Claim 159. (Withdrawn) The array of claim 12 wherein the bottoms of the 
wells are square, round, V-shaped or U-shaped. 

Claims 160-161. (Canceled). 
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Claim 162. (Withdrawn) The kit of claim 93 wherein the organism is selected 
from the group consisting of human, primate, mouse, rat, cat, dog, horse, and cow. 
Claim 163. (Canceled). 

Claim 164. (Currently amended) The array of claim 1 wherein the mammal 
organism is selected from the group consisting of human, primate, mouse, rat, cat, dog, 
horse, and cow. 

Claim 165. (Withdrawn) The array of claim 12 wherein the organism is 
selected from the group consisting of human, primate, mouse, rat, cat, dog, horse, and 
cow. 



Claim 166. 
Claim 167. 

human. 

Claim 168. 

Claim 169. 
organism is human. 

Claim 170. 

Claim 171. 

Claim 172. 

Claim 173. 
organism is mouse. 

Claim 174. 
organism is mouse. 

Claim 175. 



(Canceled). 

(Withdrawn) The kit of claim 162, wherein the organism is 
(Canceled). 

(Previously presented) The array of claim 164, wherein the 
(Canceled). 

(Withdrawn) The kit of claim 162, wherein the organism is mouse. 
(Canceled). 

(Previously presented) The array of claim 164, wherein the 
(Currently amended) The array of claim 164 [[166]], wherein the 
(Withdrawn) The kit of claim 162, wherein the organism is rat. 
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Claim 176. (Canceled). 

Claim 177. (Previously presented) The array of claim 164, wherein the 
organism is rat. 

Claims 178-180. (Canceled). 

Claim 181. (Previously presented) The positionally addressable protein array 
of claim 1, wherein the plurahty of different substances comprises 61 different purified 
active kinases of an organism. 

Claim 1 82. (Currently amended) The positionally addressable protein array of 
claim 1, wherein the plurahty of different substances comprises 92 different purified 
active kinases of a mammal, a veast, or a Drosophila an organism . 

Claim 183- (Currently amended) The positionally addressable protein array of 
claim 1, wherein the plurality of different substances comprises 110 different purified 
active kinases of a mammal a veast, or a Drosophila an orAanism . 

Claim 184. (Currently amended) The positionally addressable protein array of 
claim 1, wherein the plurality of different substances comprises 116 different purified 
active kinases of a mammaL a veast. or a Drosophila on organism . 

Claim 185. (Currently amended) The positionally addressable protein array of 
claim 1, wherein the plurality of different substances comprises 119 different purified 
active kinases of a mammal, a veast. or a Drosophila an organiom . 

Claim 1 86. (Currently amended) The positionally addressable protein array of 
claim 1, wherein the plurality of different substances comprises 122 purified active 
different kinases of a mammal, a veast, or a Drosophila an or i ^aniom . 
Claim 187. (Canceled). 
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Claim 1 88. (Previously presented) The posUionally addressable array of claim 
1 , wherein the kinases are yeast kinases. 
Claims 189-192. (Canceled). 

Claim 193. (Previously presented) The positionally addressable array of claim 
1, wherein the different substances are 61 purified active kinases. 

Claim 194. (Currently amended) The positionally addressable array of claim 
193, wherein the kinases are m o mb o rs of the serine/threonine kinase family members, 
mombors of th e tyrosine kinase family members , or the IdnasoG aro members of the 
serine/threonine kinase family and m o mboroofth e tyrosine kinase family members. 

Claim 195. (Currently amended) The positionally addressable array of claim 
1, wherein the functional kinase domains are fimctional kinase domains of mcmboro of 
the serine/threonine kinase family members , functional kinase domains of mcmborG of 
tfee tyrosine kinase family members , or wlierein the functional kinase domains comprise 
functional kinase domains of kinaGOG that arc members of th e serine/threonine kinase 
family members and functional kinase domains of kinases that ar o memb o rs of the 
tyi'osine kinase family members . 

Claim 196. (Withdrawn) The positionally addressable array of claim 1, 
wherein the kinases or functional kinase domains are recombinant proteins. 

Claim 197. (Withdrawn) The positionally addressable array of claim 196, 
wherein the recombinant proteins are recombinant fusion proteins. 

Claim 198. (Canceled). 
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DETAILED ACTION 
Continued Examination Under 37 CFR 1.114 

A request for continued examination under 37 CFR 1.114, 
including the fee set forth in 37 CFR 1.17(e), was filed in this 
application after final rejection. Since this application is 
eligible for continued examination under 37 CFR 1.114, and the 
fee set forth in 37 CFR 1.17(e) has been timely paid, the 
finality of the previous Office action has been withdrawn 
pursuant to 37 CFR 1.114. Applicant's submission filed on 
4/2/09 has been entered. 

Claims Status 

Claims 1-16, 93-101, 106, 107, 112-133, 138-159, 162, 164- 
165, 167, 169, 171, 173-175, 177, 181-186, 188, and 193-197 
are pending. 

Claims 17-92, 102-105, 108-111, 134-137, 160-161, 163, 166, 
168, 170, 172, 176, 178-180, 187, 189-192 and 198 have been 
cancelled. 

Claims 12-16, 93-101, 106, 107, 112-133, 138-140, 142-159, 
162, 165, 167, 171, 175 and 196-197 are withdrawn from further 
consideration pursuant to 37 CFR 1.142(b), as being drawn to 
nonelected species, there being no allowable generic or linking 
claim. 
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Claims 1-11, 141, 164, 169, 173, 174, 177, 181-186, 188 and 
193-195 are under consideration in this Office Action, 

The inadvertent indication that claims 173-174, 177, 181- 
186, 188 and 192-195 are withdrawn from consideration at page 2 
of the last Office action is regretted. These claims have been 
rejected throughout the Office action, as stated by applicants 
and the rejections are reiterated as shown below. Applicants' 
reguest that these claims be confirmed as not withdrawn from 
examination in the instant application is hereby granted. 

Withdrawn Objection and Rejections 

In view of applicants' arguments and amendments to the 
claims, the 35 USC 112, first paragraph (new matter); second 
paragraph and obviousness double patenting rejections have been 
withdrawn , 

Claim Rejections - 35 USC § 112 

Claims 1-11, 141, 164, 169, 173, 174, 177, 181-186, 188 and 
193-195, as amended, are rejected under 35 U.S.C, 112, first 
paragraph, as failing to comply with the written description 
requirement. The claim (s) contains subject matter, which was 
not described in the specification in such a way as to 
reasonably convey to one skilled in the relevant art that the 
inventor (s), at the time the application was filed, had 
possession of the claimed invention. 
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A "written description of an invention involving a 

chemical genus, like a description of a chemical species, 

requires a precise definition, such as by structure, formula 

[or] chemical name of the claimed subject matter sufficient to 

distinguish it from other materials". University of California 

V. Eli Lilly and Col, 43 USPQ 2d 1398, 1405(1997), quoting 

Fiefs V, Revel, 25 USPQ 2d 1601m 16106 (Fed. Cir. 1993. 

The claimed invention is drawn to a positionally 
addressable array comprising a plurality of different 
substances on a solid support, with each different substance 
being at a different position on the solid support, wherein 
the density of the different substances on the solid support 
is at least 100 different substances per cm2, and wherein the 
plurality of different substances comprises 61 purified active 
kinases or functional kinase domains thereof of a mammal, 61 
purified active kinases or functional kinase domains thereof 
of a yeast, or 61 purified active kinases or functional kinase 
domains thereof of a Drosophila. 

The specification fails to provide an adequate written 
description of 61 purified kinase and functional domains thereof 
from any organism such as mammals, bacteria, viruses. The 
specification provides general statements of these various 
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kinases in an organism which are not a detail description of the 
invention. The detail description in the specification (Example 
1/ page 27) describes the 122 kinase genes specifically from the 
yeast genome, not from the broad claimed any kinase from any 
type of organisms or functional domains thereof. A written 
description of a single species would not be a written 
description for the genus as claimed. At the time of applicants' 
invention kinases in any organism included in the huge scope of 
the claim has not been fully characterized such that it has been 
positioned in an array without denaturing the purified protein. 
A skilled artisan recognizes that one cannot rule out the 
possibility that kinases other than the desired enzyme can 
contaminate any type of purification preparations. 
Notwithstanding this, the kind/type and preparation of a 
substrate compatible with the purified protein is also a factor 
to consider for a purified protein to be active in an array. 
Furthermore, ^'although most of the kinases were active in [our] 
assays, several were not. Presumably, our preparations of these 
latter kinases either lack sufficient quantities of an activator 
or were not purified under activating conditions. For example, 
Cdc28 which was not active in our assays, might be lacking its 
activating cyclins. For the case of Hogl, cells were treated 
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with high salt to activate the enzyme,..," (paragraph [0161] of 
the instant specification, publication no, 20030207467) . 

Attention is also drawn to the numerous prior art cited by 
applicants, inter alia, the Anderson reference, which teaches 
the numerous unforeseeable factors of a purified kinase 
positioned in an array. 

Anderson states that: 

...protein microarrays have still not found widespread use, 
in part because producing them is challenging. 
Historically, it has required the high-throughput 
production and purification of protein, which then must be 
spotted on the arrays. Once printed, concerns remain about 
the shelf life of proteins on the arrays. 

Shaw et al (Drug Discovery and Development, Exhibit B) 
concords with the statement that: 

"[i]t was first thought that protein biochips would just be 
an extension of DNA microarrays, and that hasn't exactly 
panned out," says Bodovitz. That's because proteins have 
proven to be much trickier to work with in array format 
than their genomic counterparts. First of all, there are 
issues of stability. Membrane proteins, for example, make 
up the majority of potential drug targets, but they're 
particularly challenging to stabilize. Then there's the 
choice of immobilization technique, which determines how 
well the target protein presents itself to the capture 
agent, and the problem of nonspecific binding. And of 
course, proteins are inherently unstable outside their 
natural habitat of living cells, making them much more 
challenging than DNA to tag and manipulate. 

An applicant shows possession of the claimed invention by 
describing the claimed invention with all of its limitations 
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using such descriptive means as words, structures and formulas 
to show that the invention is complete, Lockwood v. American 
Airlines, Inc., 107 F.3d 1565, 1572, 41 USPQM 1961, 1966 (Fed. 
Cir. 1997); MPEP 2163. Herein, kinase has been described only in 
words. The characterization of the different kinases from one 
organism to another from the numerous kinases and numerous 
organisms has not been adequately described to distinguish one 
from the other. To date only a few organisms are fully 
characterized and the kinase region has not been fingerprinted 
in a partly or even fully characterized gene. The description 
lacks structural characterization of a purified kinase as 
generically claim. It does not distinguish one kinase from 
another and/or one organism from another positioned in any 
kind/type of substrate array and reasonably expect the purified 
kinase to retain its active form. 



Claim Rejections - 35 USC §112 

Claims 1-11, 141, 164, 169, 173, 174, 177, 181-186, 188 and 
193-195, as amended, are rejected under 35 U.S.C. 112, first 
paragraph, because the specification, while being enabling for 
yeast protein' kinases of the Ser/Thr and tyrosine kinase 
family, does not reasonably provide enablement for the broad 
scope of an array of 61 kinases and functional domain kinase 
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from an organism as mammal, yeast or Drosophila. The 
specification does not enable any person skilled in the art to 
which it pertains, or with which it is most nearly connected, to 
make and use the invention commensurate in scope with these 
claims for reasons as repeated below. 

The claimed array comprises a broad genus of compositions. 
The claimed different substances encompass any members of the 
protein kinase from the organism of 'mammal, yeast, or 
Drosophila' which is broader than the enabling disclosure. The 
claimed array represents enormous scope because the claims do 
not place any limitations on the kind, number and/or length of 
kinase either singly from one family of organism or a 
combination (s) from the different numerous recited organisms. 
The instant specification is directed to an array comprising a 
plurality of different yeast protein kinase, specifically 122 
different yeast protein' kinases of the Ser/Thr and tyrosine 
kinase family members (see specification: example I, pg, 27, 
line 19 thru pg, 35, line 20; example II, pg. 41, line 19 thru 
pg. 43, line 6) . The specification does not provide reasonable 
assurance to one skilled in the art that the 61 kinases found in 
the yeast could be found in any or all of the organisms such as 
mammals especially the functional domain thereof. It is not 
apparent from the specification whether the same number of 
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kinases or the kind of kinases or functional domain thereof can 
be found in any other organisms and made into an array. It is 
not apparent from the disclosure as to the functional domain of 
the kinase and the specific function attributed to said kinase 
positioned on the array. The general knowledge and level of 
skill in the art do not supplement the omitted description 
because specific, not general, guidance is what is needed. In a 
highly unpredictable art, as biotechnology, where one cannot 
predict whether one species would be predictive to the huge 
scope of the claim, one cannot make a priori statement without 
any experimental studies. Factors such as the compatibility of 
the array with the substrate and compounds disposed therein, the 
compounds (kinases) itself and other unpredictable variables can 
affect the active form of any kinase. Thus, one cannot predict 
from a single species its correspondence or extrapolation to the 
genus, as claimed- 

Response to Arguments 

Applicants note that the use of kinases from other 
organisms, including mammals and Drosophila, in the arrays of 
the presently claimed invention would not have required undue 
experimentation, but rather, simple, straightforward 
experiments. The protein kinases and functional kinase domains 
for use in the presently claimed invention are all well-known. 
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well-characterized proteins that the ordinarily skilled artisan 
would easily comprehend. 

In reply, attention is drawn to the instant disclosure at 
e.g.. Example I which states that the tyrosine kinase family 
members do not exist although seven protein kinases that 
phophorylate have been reported. Applicants' arguments that 
array from any organisms are simple and straightforward are 
inconceivable given only the single species in the 
specification . 

Applicants rely on the Hanks reference for its disclosure 
that "there are now hundreds of different members [of the kinase 
superfamily] whose sequences are known." Hanks and Hunter, page 
576. Furthermore, kinases, for example serine kinases, were 
already readily recognized in 1995 by virtue of their conserved 
subdomains. Applicants similarly rely on numerous references and 
the Synder declaration to show that kinases are well- 
characterized and known in the art. 

In reply, there is nothing in Hanks' reference that 
discloses these hundreds of kinases are from any mammals or 
Drosophila or from any other origin as broadly claimed. 

All of the references cited by applicants and the Synder 
declaration provide also only general statements. The 
characterization is for a specific kinase not in purified and 
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active form that has been positioned in an array for any kind of 
organisms fully or partly characterized. Each of the references 
and the declaration fail to take in consideration the numerous 
factors of the claim genus array besides the characterization of 
the kinase. For example, none of the references describes how 
the numerous kinases from different mammals, different strains 
of yeast or Drosophila can be purified. How this pure kinase has 
been positioned in an array and still remains active. 
Applicants' statements throughout the REMARKS as to the 
skepticism in the art provide evidence as to the high 
unpredictability in the art. Furthermore, applicants in the 
specification (Published patent 20030207467) states at e.g., 
paragraph [0038] FIG. lb, that from three attempts, 106 kinase 
proteins were purified. In spite of repeated attempts, the last 
14 of 119 GST fusions were undetectable by immunoblotting 
analysis. Further, at page 34 of the instant REMARKS, applicants 
state : 

In Ge, "UPA, a universal protein array system for 
quantitative detection of protein-protein, protein-DNA, 
protein-RNA and protein-ligand interactions the author was 
only able to produce arrays comprising 48 proteins at a 
very low density, utilizing a traditional purification 
format. Extension of this disclosure to arrays comprising 
at least 100 different substances per cm2 would have 
required extensive, undue experimentation beyond the scope 
of the disclosure provided in this reference. (Emphasis 
added. ) 
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Thus, an enabling disclosure for a single species of a 

protein would not be enabling for the broad scope of other 

protein kinases from any kind of organisms. [Reciting the kinase 

is a Thr/Ser and Tyr of a yeast protein would overcome this 

rejection. ) 

Claim Rejections - 35 USC §112 

Claims 1-11, 141, 164, 169, 173, 174, 177, 181-186, 188 and 
193-195, as amended, are rejected under 35 U.S.C, 112, second 
paragraph, as being indefinite for failing to particularly point 
out and distinctly claim the subject matter which applicant 
regards as the invention. 

A. In claim 1, the metes and bounds of the claim 
"functional kinase domains" is vague and indefinite as to the 
kind, length or region the domain encompasses in a purified, 
active form to be a functional kinase. It is not clear whether 
the functional kinase domain is positioned together with the 
full length kinase with the 61 different kinases from the 
different organisms. And, still expect to be pure and active 
without being masked by the full length kinases. 

B. Non-sequitur for ''the solid support'' in claims 11 and 
141. The base claim 1 does not recite a solid support. Also, 
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"the organism" in claim 154, claim 159, claim 173, 175 and 177: 
''the serine/threonine kinase family", ''the tyrosine kinase 
family" in claims 194 and 195 all lack antecedent basis of 
support from the base claim 1, 

C. Claim 174 depends on canceled claim 165. 

D. Claims 181-185 and 193 which each recite an organism 
broaden the base claim 1, Claim 1 recites mammals, yeast and 
Drosophila. However, organisms include e.g., bacteria, viruses 
and other organisms besides the ones recited therein. 
Furthermore, claim 1 recites 61 different kinases however, 
claims 182-186 recites 92, 110, 116, 119 and 122 recite purified 
active kinases which is broader than the 61 purified kinase 
recited in claim 1. 

The text of those sections of Title 35, U.S. Code not 
included in this action can be found in a prior Office action. 

Claim Rejections - 35 USC § 102/ § 103 

Claims 1-11, 141, 181-185, 188, and 193-195, as amended, 
are rejected under 35 U.S.C. 102(a) as anticipated by or, in the 
alternative, under 35 U.S.C. 103(a) as obvious over Uetz et al 
(Nature, 2/10/2000) for reasons of record as reiterated below. 
Uetz et al, throughout the reference, teach a protein array 
representing yeast genome encoded proteins (see Abstract of 
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the reference) • The reference teaches fusing roughly 6000 
potential ORFs (genes) from yeast genome (which comprises 
approximately 6000 genes) (see page 623, left col., 1st 
paragraph, and page 624, left col., 2nd paragraph). Uetz 
teaches the yeast proteins were expressed in 96-well assay 
plates (page 624, left col., bottom of 2nd paragraph), which 
reads on a solid support of the addressable array of claim 1 
because each well of the plates would have defined (or 
addressable positions) . The reference also teach each of the 
protein encoded by a gene is expressed individually in 
individual wells of the plates as shown in Figure 1 of the 
reference (page 624), which reads on each protein being at a 
different position on a solid support of claim 1, for example. 
The claimed kinase present in the array would have been 
inherent to the yeast array taught by Uetz since yeast 
inherently contain kinase in its structure or would have been 
obvious to determine given the identified genome of yeast as 
taught by Uetz. 

Where the claimed and prior art products are identical or 
substantially identical, or are produced by identical or 
substantially identical processes, the PTO can require an 
applicant to prove that the prior art products do not 
necessarily or inherently possess the characteristics of 
his claimed product. See In re Ludtke, supra. Whether the 
rejection is based on "inherency" under 35 USC 102, on 
"prima facie obviousness" under 35 USC 103, jointly or 
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alternatively, the burden of proof is the same as is 
evidenced by the PTO's inability to manufacture products or 
to obtain and compare prior art products* See In re Brown, 
59 CCPA 1036, 459 F.2d 531, 173 USPQ 685 (1972); In re 
Best 195 USPQ 430 (CCPA 1977), 

Response to Arguments 

Applicants submit that Uetz does not disclose the 
preparation of an array comprising purified active kinases, and 
hence, cannot anticipate the presently claimed invention. As set 
forth in the Methods section of Uetz, at page 627, the disclosed 
arrays were prepared by transferring patches of transformed 
yeast cells into wells of a micro-array assay plate, Uetz does 
not disclose any purification of the yeast proteins prior to 
placement in the assay plate, just simply transfer of the 
transformed cells. Hence, Uetz does not disclose the use of 
purified kinases or functional kinase domains, as recited in 
present claim 1. Applicants acknowledge that, even assuming the 
arrays disclosed in Uetz comprise 61 kinases, there is no 
disclosure in Uetz sufficient to render obvious the construction 
of an array of at least 61 kinases or functional kinase domains, 
in which the array comprises kinases that are purified and 
active, as recited in present claim 1. 

In response, applicants' arguments as to the construction 
of the array are not commensurate in scope with the claims. The 
claims are drawn to an array and not to a method of making the 
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array. Nonetheless, attention is drawn to the disclosure of Uetz 

at e.g., paragraph bridging col.l and col. 2 which recites: 

To examine protein activity in a format that allows the 
assay of every predicted ORF: we constructed an array of 
hybrid proteins. At least two general types of protein 
array may be envisioned: those composed of living 

transf ormants and those composed solely of the purified 

proteins (7) . The two- hybrid array used here is a set of 
yeast colonies derived from about 6,000 individual 
transformants 



Claims 1-11, 141, 181-186, 188, and 193-195 are rejected 
under 35 U.S.C. 103(a) as being unpatentable over Shalon (WO 
95/35505) in view of Felder et al (USP 6458533) or Lafferty(USP 
6972183) for reasons of record as repeated below. 

Shalon discloses at e.g., page 12, lines 3-9: 

A microarray as an array of regions having a density of 
discrete regions of at least about 100/cm2, and preferably 
at least about 1000/cm2. The regions in a microarray have 
typical dimensions, e.g., diameters, in the range of 
between about 10-250 urn, and are separated from other 
regions in the array by about the same distance. 

Shalon discloses at e.g., page 30, line 30 up to page 32, line 
15: 

Sheets of plastic-backed nitrocellulose where each 
microarray could contain, for example, 100 DNA fragments 
representing all known mutations of a given gene. The 
region of interest from each of the DNA samples from 96 
patients could be amplified, labeled, and hybridized to the 
96 individual arrays with each assay performed in 100 
microliters of hybridization solution In addition to the 
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genetic applications listed above, arrays of... enzymes.... 
preparations 

Shalon discloses an array of enzymes and not kinase as 
claimed. However, Feder discloses: 
Feder discloses at Example 18: 

Kinases are enzymes that attach a phosphate to proteins. 
Many have been shown to stimulate normal and neoplastic 
cell growth. Hence, compounds that inhibit specific kinases 

(but not all kinases) can be used to test whether the 
kinases are involved in pathology and, if so, to serve as 
starting points for pharmaceutical development... Each kinase 
has substrates that are partially identified, as short 
peptides that contain a tyrosine. Some of the kinase 
specificities overlap so that different kinases may 
phosphorylate some peptides equally but others 
preferentially. For the five kinases, 36 peptide substrates 
are selected that show a spectrum of specific and 
overlapping specificities . 

Lafferty discloses at e.g., col, 31, lines 41-49 the 
conventionality of an array containing subtrate-enzymes such as 
kinase . 

Accordingly, it would have been obvious to one having 
ordinary skill in the art at the time the invention was made to 
use in the array of Shalon the enzyme kinase as taught by Feder, 
Feder teaches that kinase have been shown to stimulate normal 
and neoplastic cell growth. To use the kinase in the array of 
Shalon would lead one having ordinary skill in the art in 
determining the kinase in the array responsible for neoplastic 
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or normal cell growth. Furthermore, as taught by Lafferty an 
array containing a kinase is known in the art. [See also 
applicants' admission in the response at page 17, of the 
12/19/2006 REMARKS. Applicant states: compositions utilizing 
well-known and well -characterized classes of proteins, as in the 
presently claimed invention] . 

Response to Arguments 
Applicants note that Shalon is primarily directed to arrays 
comprising polynucleotides (see Examples 1-3), and only 
mentions in passing that arrays comprising proteins and 
enzymes could be constructed. 

In reply, in considering disclosure of a reference, it is 
proper to take into account not only specific teachings of 
the reference but also "inferences" which one skilled in 
the art would reasonably be expected to draw therefrom. In 
re Preda 159 USPQ 342. Accordingly, the mention of protein 
array in Shalon suffices the prima facie finding of 
obviousness . 

Felder discloses preparation of arrays comprising 
peptides that are substrates for kinases, not arrays 
comprising the kinases themselves, "[a] chimeric linker 
molecule is prepared in which a 25 base pair 
oligonucleotide complementary to one of the anchors is 
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crosslinked to a peptide substrate of a tyrosine 

phosphokinase enzyme." Felder at column 44, lines 18-21 

(emphasis added) . Thus, Felder does not disclose the 

preparation of arrays comprising 61 purified active kinases 

or functional kinase domains thereof, as recited in present 

claim 1, 

With regard to Lafferty, Applicants note that the arrays 
disclosed therein are limited to enzymes expressed in 
expression library cells, and that Lafferty does not 
disclose the purification of these enzymes prior to 
placement on a solid support, as recited in the presently 
claimed invention . 

In response, Felder is employed not for the purpose as 
argued rather for its disclosure of the known kinases. 
Shalon teaches the arrays of enzymes, to which the specific 
enzyme . kinase would be prima facie obvious to position 
therein as Felder teaches the known kinases. Lafferty is 
also employed not for the purpose as argued. Please see the 
rejection above. Hence, the combined teachings of the prior 
art would lead one having ordinary skill in the art to the 
claimed array of purified kinase. 

When considering obviousness of a combination of known 
elements, the operative question is thus "whether the 
improvement is more than the predictable use of prior art 
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elements according to their established functions," KSR 
International Co. v. Teleflex Inc., 550 USPQ2d 1385 (2007). 



No claim is allowed. 

Any inquiry concerning this communication or earlier 
communications from the examiner should be directed to TERESA 
WESSENDORF whose telephone number is (571)272-0812. The 
examiner can normally be reached on flexitime. 

If attempts to reach the examiner by telephone are 
unsuccessful, the examiner's supervisor, Christopher Low can be 
reached on 571-272-0951. The fax phone number for the 
organization where this application or proceeding is assigned is 
571-273-8300. 

Information regarding the status of an application may be 
obtained from the Patent Application Information Retrieval 
(PAIR) system. Status information for published applications 
may be obtained from either Private PAIR or Public PAIR. Status 
information for unpublished applications is available through 
Private PAIR only. For more information about the PAIR system, 
see http://pair-direct.uspto.gov. Should you have questions on 
access to the Private PAIR system, contact the Electronic 
Business Center (EEC) at 866-217-9197 (toll-free) . If you would 
like assistance from a USPTO Customer Service Representative or 
access to the automated information system, call 800-786-9199 
(IN USA OR CANADA) or 571-272-1000. 

/TERESA WESSENDORF/ 

Primary Examiner, Art Unit 1639 
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DETAILED ACTION 
Status of Claims 

Claims 1-16, 93-101, 106, 107, 112-133, 138-159, 162, 164- 
165, 167, 169, 171, 173-175, 177, 181-186, 188, and 193-197 
are pending. 

Claims 12-16, 93-101, 106, 107, 112-133, 138-140, 142-159, 
162, 165, 167, 171, 175 and 196-197 are withdrawn from further 
consideration pursuant to 37 CFR 1.142(b), as being drawn to 
nonelected species, there being no allowable generic or linking 
claim. 

Claims 17-92, 102-105, 108-111, 134-137, 160-161, 163, 166, 
168, 170, 172, 176, 178-180, 187, 189-192 and 198 have been 
cancelled. 

Claims 1-11, 141, 164, 169, 173, 174, 177, 181-186, 188 and 
193-195 are under consideration in this Office Action. 

Withdrawn Objection and Rejections 

In view of applicants' arguments and amendments to the 
claims, the 35 USC 112, second paragraph rejections have been 
withdrawn . 
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The text of those sections of Title 35, U.S. Code not 

included in this action can be found in a prior Office action. 

Claim Rejections - 35 USC §112 

Claims 1-11, 141, 164, 169, 173, 174, 177, 181-186, 188 and 
193-195, as amended, are rejected under 35 U.S.C. 112, first 
paragraph, as failing to comply with the written description 
requirement. The claim (s) contains subject matter, which was 
not described in the specification in such a way as to 
reasonably convey to one skilled in the relevant art that the 
inventor (s), at the time the application was filed, had 
possession of the claimed invention for reasons of record as 
reiterated below. 

Written Description Rejection 

A "written description of an invention involving a 
chemical genus, like a description of a chemical species, 
requires a precise definition, such as by structure, formula 
[or] chemical name of the claimed subject matter sufficient to 
distinguish it from other materials". University of California 
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V. Eli Lilly and Col, 43 USPQ 2d 1398, 1405(1997), quoting 
Fiefs V, Revel, 25 USPQ 2d 1601m 16106 (Fed. Cir, 1993. 

The claimed invention is drawn to a positionally 
addressable array comprising a plurality of different 
substances on a solid support, with each different substance 
being at a different position on the solid support, wherein 
the density of the different substances on the solid support 
is at least 100 different substances per cm2, and wherein the 
plurality of different substances comprises 61 purified active 
kinases or functional kinase domains thereof of a mammal, 61 
purified active kinases or functional kinase domains thereof 
of a yeast, or 61 purified active kinases or functional kinase 
domains thereof of a Drosophila. 

The specification fails to provide an adequate written 
description of 61 purified kinase and functional domains thereof 
from any organism such as mammals. The specification provides 
general statements of these various kinases in an organism which 
are not a detail description of the invention. The detail 
description in the specification (Example I, page 27) describes 
the 122 kinase genes specifically from the yeast genome, not 
from the broad claimed any kinase from any type of mammals or 
Drosophila or functional domains thereof. A written description 
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of a single species would not be a written description for the 
genus as claimed. At the time of applicants' invention kinases 
in any mammals included in the huge scope of the claim has not 
been fully characterized such that it has been positioned in an 
array without denaturing the purified protein. A skilled artisan 
recognizes that one cannot rule out the possibility that kinases 
other than the desired enzyme can contaminate any type of 
purification preparations. Notwithstanding this, the kind/type 
and preparation of a substrate compatible with the purified 
protein is also a factor to consider for a purified protein to 
be active in an array. Furthermore, ^'although most of the 
kinases were active in [our] assays, several were not. 
Presumably, our preparations of these latter kinases either lack 
sufficient quantities of an activator or were not purified under 
activating conditions. For example, Cdc28 which was not active 
in [our] assays, might be lacking its activating cyclins. For 
the case of Hogl, cells were treated with high salt to activate 
the enzyme...." (paragraph [0161] of the instant specification, 
publication no. 20030207467) . 

Attention is also drawn to the numerous prior art cited by 
applicants, inter alia, the Anderson reference, which teaches 
the numerous unforeseeable factors of a purified kinase 
positioning in an array. 
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Anderson states that: 

...protein microarrays have still not found widespread use, 
in part because producing them is challenging. 
Historically, it has required the high-throughput 
production and purification of protein, which then must be 
spotted on the arrays. Once printed, concerns remain about 
the shelf life of proteins on the arrays. 

Shaw et al (Drug Discovery and Development, Exhibit B) 
concords with the statement that: 

"[i]t was first thought that protein biochips would just be 
an extension of DNA microarrays, and that hasn't exactly 
panned out," says Bodovitz. That's because proteins have 
proven to be much trickier to work with in array format 
than their genomic counterparts. First of all, there are 
issues of stability. Membrane proteins, for example, make 
up the majority of potential drug targets, but they're 
particularly challenging to stabilize. Then there's the 
choice of immobilization technique, which determines how 
well the target protein presents itself to the capture 
agent, and the problem of nonspecific binding. And of 
course, proteins are inherently unstable outside their 
natural habitat of living cells, making them much more 
challenging than DNA to tag and manipulate. 

An applicant shows possession of the claimed invention by 
describing the claimed invention with all of its limitations 
using such descriptive means as words, structures and formulas 
to show that the invention is complete. Lockwood v. American 
Airlines, Inc., 107 F.3d 1565, 1572, 41 USPQM 1961, 1966 (Fed. 
Cir. 1997); MPEP 2163. Herein, kinase has been described only in 
words. The characterization of the different kinases from one 
organism to another from the numerous kinases and numerous 
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organisms has not been adequately described to distinguish one 
from the other. To date only a few organisms are fully 
characterized and the kinase region has not been fingerprinted 
in a partly or even fully characterized gene. The description 
lacks structural characterization of a purified kinase as 
generically claim. It does not distinguish one kinase from 
another and/or one organism from another positioned in any 
kind/type of substrate array and reasonably expect the purified 
kinase to retain its active form. 

Response to Arguments 

Applicants submit that kinases and functional kinase 
domains from yeast, mammals and Drosophila were a well 
characterized group of proteins that were generally known, 
understood to be well conserved in structure and function, 
easily identified, and readily prepared and assayed by those of 
ordinary skill in the art on the priority date of the present 
application. Thus, such proteins were well known to those of 
ordinary skill in the art, and hence, a re-description of such 
proteins is not required under Capon. 

In reply, the claims are not drawn only to the alleged 
known, well characterized kinases from yeast, mammals and 
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Drosophila or functional domain thereof. Rather, to a kinase 
array for all or any kind of kinase from e.g., mammals or 
functional domain immobilized in every conceivable manner on any 
kind of solid support. If an appellant choose to rely upon 
general knowledge in the art to render his disclosure enabling, 
the appellant must show that anyone skilled in the art would 
have actually possessed the knowledge. In re Lange (CCPA 1981) 
644 F2d 856, 209 USPQ 288, or would reasonably be expected to 
check the source which appellant relies upon to complete his 
disclosure and would be able to locate the information with no 
more than reasonable intelligence. Herein, there is no explicit 
description of the generic claim array containing kinases from 
any kind of e.g., mammals (except for humans as per the newly 
submitted Schweitzer declaration) , let alone to the functional 
domain thereof. Claims drawn to the use of known chemical 
compounds must have a corresponding written description only so 
specific as to lead one to that class of compounds. In re 
Herschler (CCPA 1979) 200 USPQ 711. 

Applicants submit that the level of skill and knowledge 
relating to protein kinases and their functional domains was 
very high on the priority date of the present application and a 
person of ordinary skill in the art would readily understand 
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that indeed. Applicants were clearly in possession of a 
positionally addressable array comprising 61 purified active 
kinases or functional kinase domains thereof of a mammal, yeast 
or Drosophila. 



In reply, the level of skill and knowledge in the art is 
high so also the unpredictability in the gene art. This is 
demonstrated by no less than applicants for the very specific 
yeast ORF genes containing kinase, not the functional domain 
thereof. Applicants state at e.g,, page 32, lines 14-17: 



" 14 of 15 119 GST::kinase samples were not detected 

by immunoblotting analysis. Presumably, these proteins 
are not stably overproduced in the pep4 protease- 
deficient strain used, or these proteins may form 
insoliible aggregates that do not purify using our 
procedures " 

Please see also the various prior art concording with 

applicants' findings above. For example, Anderson states: 

...protein microarrays have still not found widespread use, 
in part because producing them is challenging. 
Historically, it has required the high-throughput 
production and purification of protein, which then must be 
spotted on the arrays. Once printed, concerns remain about 
the shelf life of proteins on the arrays. 
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Applicants state that the specification describes the use 
of positionally addressable arrays containing proteins and 
functional domains of the proteins from organisms including 
mammals, yeast and Drosophila (published [0058])/ and provides a 
working example describing the production of a protein chip 
containing over 100 functional yeast kinases and yeast kinase 
domains (See Example I) . 

In reply, the detail description of the yeast array as 
described in Example I is not controverted. The issue is e.g., 
an array containing not only yeast but any type of mammal kinase 
and/or Drosophila (seguence or non-seguence) (and at least 61 
kinase in an array) . 

There is nothing in the description or any prior art 
teachings of an array containing an immobilized kinase from 
yeast, mammals and Drosophila or functional domain thereof. It 
does not describe that the 61 kinase present in yeast can be 
extrapolated or are similarly present to the different numbers 
and kinds of kinases found in any kind of mammals or Drosphila. 

Attention is again directed to the different prior art 
cited above as to the high unpredictability in the art for an 
array containing protein such as kinase. 
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Claim Rejections - 35 USC §112 

Enablement Rejection 

Claims 1-11, 141, 164, 169, 173, 174, 177, 181-186, 188 and 
193-195, as amended, are rejected under 35 U.S.C. 112, first 
paragraph, because the specification, while being enabling for 
yeast protein' kinases of the Ser/Thr and tyrosine kinase 
family, does not reasonably provide enablement for the broad 
scope of an array of 61 kinases and functional domain kinase 
from an organism as mammal, yeast or Drosophila. The 
specification does not enable any person skilled in the art to 
which it pertains, or with which it is most nearly connected, to 
make and use the invention commensurate in scope with these 
claims for reasons as repeated below. 

The claimed array comprises a broad genus of compositions. 
The claimed different substances encompass any members of the 
protein kinase from the organism of mammal, yeast, or Drosophila 
which is broader than the enabling disclosure. The claimed array 
represents enormous scope because the claims do not place any 
limitations on the kind, number and/or length of kinase either 
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singly from one family of organism or a combination (s) from the 
different numerous recited organisms. The instant specification 
is directed to an array comprising a plurality of different 
yeast protein kinase, specifically 122 different yeast protein* 
kinases of the Ser/Thr and tyrosine kinase family members (see 
specification: example I, pg. 21, line 19 thru pg. 35, line 20; 
example II, pg. 41, line 19 thru pg. 43, line 6) . The 
specification does not provide reasonable assurance to one 
skilled in the art that the 61 kinases found in the yeast could 
be found in any or all of the organisms such as mammals 
especially the functional domain thereof. It is not apparent 
from the specification whether the same number of kinases or the 
kind of kinases or functional domain thereof can be found in any 
other organisms and made into an array. It is not apparent from 
the disclosure as to the functional domain of the kinase and the 
specific function attributed to said kinase positioned on the 
array. The general knowledge and level of skill in the art do 
not supplement the omitted description because specific, not 
general, guidance is what is needed. In a highly unpredictable 
art, as biotechnology, where one cannot predict whether one 
species would be predictive to the huge scope of the claim, one 
cannot make a priori statement without any experimental studies. 
Factors such as the compatibility of the array with the 
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substrate and compounds disposed therein, the compounds 
(kinases) itself and other unpredictable variables can affect 
the active form of any kinase. Thus, one cannot predict from a 
single species its correspondence or extrapolation to the genus, 
as claimed. 

Response to Arguments 

Applicants cite the Schweitzer Declaration at pages 3-4, 
section 8, the Snyder Declaration and Replies to the Office 
Action filed December 21, 2007, and April 20, 2009 to support 
their enablement position. It is asserted that the replies and 
declaration show that protein kinases and functional kinase 
domains used in the claimed positionally addressable arrays were 
at the time this application was filed, all well-known, and 
well-characterized. Reference was made to the Hunter and Plowman 
reference . 

In reply, as stated above the claims are drawn to an array 
of the well-known and well-characterized kinases in yeast and 
etc. and not solely to the well-known and characterized yeast, 
per se. 
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Hunter like the specification is drawn only to kinase in 
yeast, which is not present in an array. Mammals or Drosophila 
has not been described by Hunter or taught that the well known 
kinase from yeast, let alone, the functional domain, applies to 
other kinases as in mammals or Droosophila. 

Applicants further assert that it was also well known at 
the time of filing of this application that kinases are highly 
conserved such that homologs exist between yeast and many other 
organisms. See Manning et al . , "The Protein Kinase Complement of 
the Human Genome," Science 298:1912-1934 (2002) at page 1913, 
first column, first paragraph (cited in Applicants' 6th SIDS 
submitted on April 20, 2009) . Furthermore, the regulation of the 
different kinases and the phosphorylation motifs of substrates 
recognized by related kinases are often the same, indicating 
that they behave similarly biochemically. See id. Moreover, as 
the structure and function of kinases were known to be highly 
conserved, it was also known that human kinases can be 
substituted for yeast kinases, illustrating the highly conserved 
structure-function relationships known to exist for kinases on 
the priority date of the application. See Lee and Nurse, 
"Complementation used to clone a human homologue of the fission 
yeast cell cycle control gene cdc2," Nature 327:31-35 (1987) 
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(cited in Applicants' 6th SIDS submitted on April 20, 2009). 
Therefore, on the priority date of the present invention, the 
state of the art relating to protein kinases was extremely high 
and was such that a person of skill in the art, in the fields of 
for example, protein purification, proteomics and analysis, 
enlightened by the teaching of the specification would have 
appreciated that no more than routine experimentation would be 
required to make and use the claimed arrays containing purified 
active kinases or functional kinase domains from a mammal, yeast 
or Drosophila. 



In reply, the arguments are not drawn again to kinase in 

yeast per se as well as to its properties or homologs thereof. 

Rather, the claim is to a kinase of yeast in an array. As Shaw 

et al stated above: 

"[i]t was first thought that protein biochips would just be 
an extension of DNA microarrays, and that hasn't exactly 
panned out," says Bodovitz, That's because proteins have 
proven to be much trickier to work with in array format 
than their genomic counterparts. First of all, there are 
issues of stability. Membrane proteins, for example, make 
up the majority of potential drug targets, but they're 
particularly challenging to stabilize. Then there's the 
choice of immobilization technique, which determines how 
well the target protein presents itself to the capture 
agent, and the problem of nonspecific binding. And of 
course, proteins are inherently unstable outside their 
natural habitat of living cells, making them much more 
challenging than DNA to tag and manipulate. (Emphasis 
added. ) 
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While the kinases are alleged to be homologs the fact 
remains that purification the technique and other experimental 
conditions/steps for yeast would be different from any type of . 
mammals. See applicants* disclosure at e.g,, page 32, lines 14- 
17 as to the unpredictable or unexpected failure of obtaining 
purified yeast in the ORF region alone (i.e., not to its alleged 
homologs) : 

14 of 15 119 GST::kinase samples were not detected 

by immunoblotting analysis. Presumably, these proteins 
are not stably overproduced in the pep4 protease- 
deficient strain used, or these proteins may form 
insoluble aggregates that do not purify using our 
procedures " (Emphasis added. ) 

Applicants submit that methods useful for confirming kinase 
activity of the proteins on the claimed arrays are described in 
the specification and were otherwise well known as of the filing 
date of the present application (see e.g.. Example 1 of 
specification) . See also Snyder Declaration at pages 3-4, 
section 7. Thus protein kinases, functional kinase domains and 
methods of assaying these proteins, were well-known in the art 
on the priority date of the present invention. 
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In reply. Example I states that the tyrosine kinase family 
members do not exist although seven protein kinases that 
phophorylate have been reported. Applicants' arguments that 
array from any organisms are simple and straightforward are mere 
arguments, absent evidence to the contrary, which cannot be 
substituted for enabling disclosure. 

Applicants state for enablement, a specification need not 
teach, and preferably omits, information that is well-known to 
those of ordinary skill in the art. See Hybritech Inc. v. 
Monoclonal Antibodies, Inc., 802 F.2d 1367, 1384 (Fed. Cir. 
1986); Lindemann Maschinetzf abrik v. American Hoist and Derrick, 
730 F.2d 1452, 1463 (Fed. Cir. 1984); In re Wands, 8 USPQ2d 
1400, 1402 (Fed. Cir. 1988). 

In reply, the Federal Circuit has cautioned against over 
reliance on the assertion that everything needed to practice the 
full scope of the claims was "known in the art" and that a 
patent need not teach, and preferably omits, what is well known 
in the art. See Genentech Inc. v. NovoNordiskA/S, 108 F.3d 1361, 
1366, 42 USPQ2d 1001, 1005 (Fed. Cir. 1997): "[T]hat general, 
oft-repeated statement is merely a rule of supplementation, not 
a substitute for a basic enabling disclosure. It means that the 
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omission of minor details does not cause a specification to fail 
to meet the enablement requirement • . . • It is the specification, 
not the knowledge of one skilled in the art that must supply the 
novel aspects of an invention in order to constitute adequate 
enablement." Herein the specification teaches the kinase of 
yeast only in the ORF region (not even of the full length yeast 
sequence. Do the kinase and its functional domain exist only in 
the ORF region and not in any other region (s) of the yeast 
gene?) Applicants point to nothing in the specification that 
would indicate to the contrary (i.e., kinase array of e.g., any 
mammal or Drosophila in any region of the full length sequence 
or unsequence protein) , 

Applicants rely upon the newly submitted Schweitzer 
Declaration, in addition to the Snyder declaration. Schweitzer 
is stated to describe the preparation of functional human 
protein kinase arrays using the teaching in the present 
specification. In addition, the functional human kinase domains 
used in the positionally addressable arrays prepared by 
Schweitzer that form the basis of the present claims were, on 
the priority date of the present application, well-known, well- 
characterized proteins with purified human kinases (see the 
Schweitzer Declaration, at page 3, section 8) . As discussed in 



Application/Control Number: 09/849,781 Page 19 

Art Unit: 1639 

detail in the Schweitzer Declaration at pages 4-7, sections 9- 
13, researchers enlightened by the information set forth in the 
specification, have used the homologies that were known to exist 
between human and yeast kinases, to inf ormatically identify 
genes for human kinases and functional domains, clone these 
genes, express these genes in Sf9 insect cells, lyse the cells 
and purify the human kinases and functional domains. (See also 
Protein-Protein Interaction Profiling on Invitrogen ProtoArray 
TM High-Density Protein Microarrays, Application Note, 
Invitrogen page 2, column 2, paragraph 3-2, column 3, 
paragraph 1 (2005) (hereinafter "Protein-Protein Interaction 
Profiling, " Exhibit B) ) . According to the Schweitzer 
Declaration, over 90% of protein kinases expressed and purified 
using the methods described in the specification were active as 
demonstrated by catalytic activity including 

autophosphorylation, wherein a protein kinase phosphorylates 
itself. See id. 

In reply, that the Schweitzer declaration teaches the 
purification of kinase from humans is not controverted. The 
Schweitzer declaration describes kinase allegedly obtained from 
human not from any mammals or Drosophila. It is not clear from 
the Schweitzer declaration just which part of the present 
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disclosure has been applied from the yeast to the human kinase 

array, except to get its homology therefrom. The instant 

specification uses chip for its yeast array, Schweitzer does not 

teach a biochip. 

Exhibit B presents a description for the human array which 

does not seem to fall or correspond to the description for 

yeast. Furthermore Exhibit C of the Schweitzer declaration 

states at page 1 : 

The family of human protein kinases consists of more than 
500 members of which only a fraction have been 
characterized to date. Much is still not known about the 
biological function of many kinases, the protein substrates 
that are phosphorylated by these kinases, or the roles of 
these kinases and substrates in disease.. 

Thus, Schweitzer has not extrapolated or predicted its 
findings to any other family o human protein kinases which 
consist of more than 500 members to which only a fraction has 
been characterized to date. 



Claim Rejections - 35 USC § 102/ § 103 



Claims 1-11, 141, 181-186, 188, and 193-195, as amended, 
are rejected under 35 U.S.C. 102(a) as anticipated by or, in the 
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alternative, under 35 U.S.C. 103(a) as obvious over Uetz et al 
(Nature, 2/10/2000) for reasons of record as reiterated below. 
Uetz et al, throughout the reference, teach a protein array- 
representing yeast genome encoded proteins (see Abstract of 
the reference) . The reference teaches fusing roughly 6000 
potential ORFs (genes) from yeast genome (which comprises 
approximately 6000 genes) (see page 623, left col., 1st 
paragraph, and page 624, left col., 2nd paragraph). Uetz 
teaches the yeast proteins were expressed in 96-well assay 
plates (page 624, left col., bottom of 2nd paragraph), which 
reads on a solid support of the addressable array of claim 1 
because each well of the plates would have defined (or 
addressable positions) . The reference also teach each of the 
protein encoded by a gene is expressed individually in 
individual wells of the plates as shown in Figure 1 of the 
reference (page 624), which reads on each protein being at a 
different position on a solid support of claim 1, for example. 
The claimed kinase present in the array would have been 
inherent to the yeast array taught by Uetz since yeast 
inherently contain kinase in its structure or would have been 
obvious to determine given the identified genome of yeast as 
taught by Uetz, 
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Where the claimed and prior art products are identical or 
substantially identical, or are produced by identical or 
substantially identical processes, the PTO can require an 
applicant to prove that the prior art products do not 
necessarily or inherently possess the characteristics of 
his claimed product. See In re Ludtke, supra. Whether the 
rejection is based on "inherency" under 35 USC 102, on 
"prima facie obviousness" under 35 USC 103, jointly or 
alternatively, the burden of proof is the same as is 
evidenced by the PTO's inability to manufacture products or 
to obtain and compare prior art products. See In re Brown, 
59 CCPA 1036, 459 F.2d 531, 173 USPQ 685 (1972); In re 
Best 195 USPQ 430 (CCPA 1977) , 



Response to Arguments 

Applicants state that Uetz does not disclose the claim 
arrays comprising purified kinases or functional kinase domains. 
Applicants assert that Uetz pg 623, col. 1, last paragraph - 
col. 2, first paragraph provides further evidence that the 
arrays in Uetz did not consist of purified proteins having 
kinase activity. 



In reply, attention is again drawn to the disclosure of 
Uetz at e.g., paragraph bridging col.l and col. 2 which recites 
a purified yeast ORF (referring to reference 7) which contains 
the kinase region) : 



To examine protein activity in a format that 
allows the assay of every predicted ORF: we 
constructed an array of hybrid proteins. At least two 
general types of protein array may be envisioned: 
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those composed of living transf ormants and those 

composed solely of the purified proteins (7) . The two- 
hybrid array used here is a set of yeast colonies 

derived from about 6,000 individual transf ormants 

(Emphasis added.) 

Thus, Uetz discloses purified protein containing the same 
kinase from yeast. Even assuming that Uetz does not disclose 
(but Uetz does) a purified kinase as argued however, the 
unpurified ORF of the yeast containing kinase would be the same 
as the claimed purified one. A purified kinase obtained from the 
same source as yeast merely further characterizes the known 
kinase present in the yeast. Applicants' use of the word 
comprising does not preclude the other elements present in the 
kinase contain in the ORF region of the yeast. As applicants 
stated above: 



Kinases and functional kinase domains from yeast, mammals 
and Drosophila were a well characterized group of proteins 
that were generally known, understood to be well conserved 
in structure and function, easily identified, and readily 
prepared and assayed by those of ordinary skill in the art 
are known. 



Applicants note that, even assuming the arrays disclosed 
in Uetz comprise 61 kinases, there is no disclosure in Uetz 
sufficient to render obvious the construction of an array of 61 
kinases or functional kinase domains, in which the array 



Application/Control Number: 09/849,781 Page 24 

Art Unit: 1639 

comprises kinases that are purified and active, as recited in 
present claim 1. 

Applicants agree that the claims are drawn to an array but 
assert that the limitation purified and active recited in the 
claims is not a process limitation, but rather a characteristic 
of the components of the array. 

In reply, as recognized by applicants, a purified kinase is 
but a further characteristics of the known (albeit, allegedly 
unpurified) compound. Thus, this further characterization does 
not make the compound, kinase any different but only further 
characterizes said known kinase. It is well settled that there 
can be no patentable invention where novelty does not exist, 
albeit all of the properties of said compositions were not 
previously recognized. (Please see further the above statements 
of applicants that these kinases are known) . 

Likewise, it is well settled in the art that where 
substance having medicinal properties is produced, it becomes an 
immediate consideration to prepare substances in as pure a form 
as possible. Claim for known substance which differs from prior 
art only in degree, as for example in purity, is not patentable. 
See Ex parte Steelmand and Kelly, 140 USPQ 189. 
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Most of applicants'' subsequent arguments rely on the 

responses filed on December 21, 2007 and the April 20, 2009 

replies and the Snyder declaration. Applicants state that the 

following exemplary references describe the skepticism from 

those in the field regarding the preparation of protein arrays 

both before and after the time of filing of the present 

application, as well as some of the problems regarding 

preparation of protein arrays comprising large numbers of 

purified active proteins that were overcome by the presently 

claimed invention") ; Schweitzer declarations and the reference 

to e.g., Anderson, all relate to the problems encountered into 

the making of the array and how the problems have been overcome. 

(Emphasis added.) However, as stated above and in the previous 

Office actions the claims are drawn to known kinases the 

immobilization thereof into a solid surface is well known in the 

art as taught by Uetz above. 

Applicants state that the examiner appears to agree that 
kinases of the various organisms such as mammals, Drosophila and 
yeast were well known at the time of filing the application. 
This is in stark contrast to the Examiner's contrary position 
noted above with regard to written description and enablement of 
the presently claimed invention. While Applicants agree that 
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kinases of Drosophila, yeast and mammals were well known in the 
art at the time of filing the present application, it would not 
have been obvious to place these kinases on a positionally 
addressable array so that they were not only purified, but also 
active . 

In reply, no contradiction exists in the rejections under 
obviousness and enablement/written description. These are two 
separate rejections. The enablement/written description are 
based on the lack of description for the broad claim genus and 
not to the yeast species containing kinase. This species is 
taught by the prior art which is included in the broad genus 
claim hence, anticipating or rendering obvious the broad genus 
claim. Applicants' further arguments relying on the Synder 
declaration as to method of making the array and Bussow are as 
stated above not commensurate with the claims, which recite 
simply the known kinases immobilized on solid support. 

Claims 1-11, 141, 181-186, 188, and 193-195 are rejected 
under 35 U.S.C. 103(a) as being unpatentable over Shalon (WO 
95/35505) in view of Felder et al (USP 6458533) or Lafferty(USP 
6972183) for reasons of record as repeated below. 
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Shalon discloses at e.g., page 12, lines 3-9: 

A microarray as an array of regions having a density of 
discrete regions of at least about 100/cm2, and preferably 
at least about 1000/cm2. The regions in a microarray have 
typical dimensions, e.g., diameters, in the range of 
between about 10-250 urn, and are separated from other 
regions in the array by about the same distance. 

Shalon discloses at e.g., page 30, line 30 up to page 32, line 
15: 

Sheets of plastic-backed nitrocellulose where each 
microarray could contain, for example, 100 DNA fragments 
representing all known mutations of a given gene. The 
region of interest from each of the DNA samples from 95 
patients could be amplified, labeled, and hybridized to the 
96 individual arrays with each assay performed in 100 

microliters of hybridization solution In addition to the 

genetic applications listed above, arrays of... enzymes... (were 
prepared] . 

Shalon discloses an array of enzymes and not kinase as 
claimed. However, Feder discloses: 
Feder discloses at Example 18: 

Kinases are enzymes that attach a phosphate to proteins. 
Many have been shown to stimulate normal and neoplastic 
cell growth. Hence, compounds that inhibit specific kinases 

(but not all kinases) can be used to test whether the 
kinases are involved in pathology and, if so, to serve as 
starting points for pharmaceutical development... Each kinase 
has substrates that are partially identified, as short 
peptides that contain a tyrosine. Some of the kinase 
specificities overlap so that different kinases may 
phosphorylate some peptides equally but others 
preferentially. For the five kinases, 36 peptide substrates 
are selected that show a spectrum of specific and 
overlapping specificities . 
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Lafferty discloses at e.g., col. 31, lines 41-49 the 
conventionality of an array containing substrate-enzymes such as 
kinase . 

Accordingly, it would have been obvious to one having 
ordinary skill in the art at the time the invention was made to 
use in the array of Shalon the enzyme kinase as taught by Feder. 
Feder teaches that kinase have been shown to stimulate normal 
and neoplastic cell growth. To use the kinase in the array of 
Shalon would lead one having ordinary skill in the art in 
determining the kinase in the array responsible for neoplastic 
or normal cell growth. Furthermore, as taught by Lafferty an 
array containing a kinase is known in the art. [See also 
applicants' admission in the response at page 17, of the 
12/19/2006 REMARKS. Applicant states: compositions utilizing 
well-known and well-characterized classes of proteins r as in the 
presently claimed invention] . 



Response to Arguments 



Applicants argue that Shalon is primarily directed to 
arrays comprising polynucleotides (see Examples 1-3) , and only 
mentions in passing that arrays comprising proteins and enzymes 
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could be constructed. Furthermore, Felder discloses preparation 
of arrays comprising peptides that are substrates for kinases, 
not arrays comprising the kinases themselves. Thus, Felder does 
not disclose the preparation of arrays comprising 61 purified 
active kinases or functional kinase domains thereof, as recited 
in present claim 1. With regard to Lafferty, Applicants note 
that the arrays disclosed therein are limited to enzymes 
expressed in expression library cells, and that Lafferty does 
not disclose the purification of these enzymes prior to 
placement on a solid support, as recited in the presently 
claimed invention. Applicants rely upon the Synder declaration 
as support that Lafferty does not disclose the purified enzyme 
on a solid support. Lafferty is alleged to use an impure clone 
of an enzyme repeatedly passed through a capillary array several 
times . 

In reply, much of applicants' arguments are drawn to the 
method of making the array, which is not commensurate in scope 
with the claims. Shalon,as recognized by applicants above, only 
mentions in passing arrays. However, this ''passing remarks" or 
inferential teachings suffice the findings of obviousness. In 
considering disclosure of a reference, it is proper to take into 
account not only specific teachings of the reference but also 



Application/Control Number: 09/849,781 Page 30 

Art Unit: 1639 

""inf erences" which one skilled in the art would reasonably be 
expected to draw therefrom. In re Preda 159 USPQ 342. 

The disclosure of Lafferty as discussed in the Synder 
declaration of passing the clone several times into a capillary 
array and producing an optically detectable signal would 
indicate a purified product to enable detection of the clone. 
The claimed number of 61 kinase is dependent upon the organism 
and location from which the kinase is contained. Thus, this 
number may be arbitrary considering that the kinase is only 
derived from the ORF region of yeast. To determine the number of 
kinases in an organism as yeast in a specific location would be 
within the ordinary skill in the art,, as evidenced from the 
various well known kinases in a protein sequence. 

Applicants cannot attack the references individually when 
the rejection is based on combination of references. Felder and 
Lafferty are employed for its disclosure of purified kinase, as 
claimed, not that it has to teach purifying the kinase prior to 
immobilization, otherwise it would be anticipatory rejection. 
Shalon teaches ORF containing kinase on an array but does not 
expressly teach, albeit implicitly, the purified kinase hence, 
the application of the secondary references Felder and Lafferty 
that renders the claim prima facie obvious. It would be within 
one having ordinary skill in the art at the time the invention 
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was made to position a known compound as kinase into an array, 
as taught by Shalon, There is nothing new and unobvious in mere 
positioning a known compound in e.g., array, when in nature 
these kinases are inherently arrayed or attached to e.g., a 
membrane, which would read on a substrate of an array. 

Applicants assert that it was unexpected that kinases and 
functional kinase domains of these kinases could be purified and 
placed on a solid support to form an array, and that these 
kinases and kinase domains would retain their/kinase activity. 
As detailed in the Snyder Declaration and the Schweitzer 
Declaration, it is only after the guidance provided in the 
present specification that a person of ordinary skill in the art 
would consider it possible to generate the presently claimed 
arrays. As discussed above and in the Schweitzer Declaration, 
Applicants respectfully submit that at the time of filing of the 
present application, it was unexpected that kinases and 
functional kinase domains of these kinases could be purified and 
placed on a solid support to form an array, it was also 
unexpected that the purified kinases and functional kinase 
domains of these kinases would retain their activity when placed 
onto the array. 
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In reply, positioning of a known compound, be it purified 
or not, would be expected since the prior art, Shalon has 
successfully applied an enzyme protein in an array. There is 
nothing novel or unobvious of mere positioning or attaching a 
known compound, as kinase, as admitted by applicants above, in 
an array. The numerous advantages derived in arraying a known 
compound e.g., high throughput screening would provide 
motivation to one having ordinary skill in the art at the time 
of filing. One would have a reasonable expectation of success in 
immobilizing the yeast ORF containing kinase in an array as 
successfully made by Shalon and others in the prior art. 

[Applicants' arguments above are mostly drawn to what 
appears to be the method of making an array immobilized 
with purified kinase. Perhaps, this might very well be 
where the novelty resides. It is therefore suggested that 
applicants draft/amend the claims to recite a method of 
making/using the array. The method claim may be an 
allowable subject mater in view of the alleged and argued 
unexpected results of the method of purifying and attaching 
kinase to a solid support.] 
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No claim is allowed. 



Conclusion 

THIS ACTION IS MADE FINAL. Applicant is reminded of the 
extension of time policy as set forth in 37 CFR 1.136(a), 

A shortened statutory period for reply to this final action 
is set to expire THREE MONTHS from the mailing date of this 
action. In the event a first reply is filed within TWO MONTHS 
of the mailing date of this final action and the advisory action 
is not mailed until after the end of the THREE-MONTH shortened 
statutory period, then the shortened statutory period will 
expire on the date the advisory action is mailed, and any 
extension fee pursuant to 37 CFR 1.136(a) will be calculated 
from the mailing date of the advisory action. In no event, 
however, will the statutory period for reply expire later than 
SIX MONTHS from the mailing date of this final action. 



This application contains claims 12-16, 93-101, 106, 107, 
112-133, 138-140, 142-159, 162, 165, 167, 171, 175 and 196-197 
drawn to a non elected invention. A complete reply to the final 
rejection must include cancellation of nonelected claims or 
other appropriate action (37 CFR 1.144) See MPEP § 821.01. 



Any inquiry concerning this communication or earlier 
communications from the examiner should be directed to TERESA 
WESSENDORF whose telephone number is (571)272-0812. The 
examiner can normally be reached on flexitime. 

If attempts to reach the examiner by telephone are 
unsuccessful, the examiner's supervisor, Christopher Low can be 
reached on 571-272-0951. The fax phone number for the 
organization where this application or proceeding is assigned is 
571-273-8300. 
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Information regarding the status of an application may be 
obtained from the Patent Application Information Retrieval 
(PAIR) system. Status information for published applications 
may be obtained from either Private PAIR or Public PAIR. Status 
information for unpublished applications is available through 
Private PAIR only. For more information about the PAIR system, 
see http://pair-direct.uspto.gov. Should you have questions on 
access to the Private PAIR system, contact the Electronic 
Business Center (EBC) at 866-217-9197 (toll-free) . If you would 
like assistance from a USPTO Customer Service Representative or 
access to the automated information system, call 800-786-9199 
(IN USA OR CANADA) or 571-272-1000. 

/TERESA WESSENDORF/ 

Primary Examiner, Art Unit 1639 
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Declaration of Barry Schweitzer, Ph.D. Under 37 C.ER. § L132 



The undersigned, Barry Schweitzer, residing at 459 Maple Avenue, Cheshire, 
CT» USA. declares and states as follows: 

1. I am currently employed by Life Technologies Inc. (hereinafter "LTI"), a 
licensee of the above-captioned application. 1 hold the positions of Director of 
Integrated Technologies, Molecular Biology Systems Division. My credentials are 
provided in the curriculum vitae that is attached to this declaration as Exhibit A. 1 
received my Ph.D. degree in Pharmacology from Yale University. As seen from my 
attached curriculum vitae, I have published many papers related to protein microarrays. 
Based on my education and experience. I am an expert in the field of yeast and human 
genomics, proteomics, and molecular genetics. 

2. I have reviewed and am familiar with U.S. Application No. 09/849,781, 
(hereinafter "the 781 application") filed on May 4, 2001, the Office Action dated July 7, 
2009 ("the Office Action"), issued by the U.S. Patent and Trademark Office in the 
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Claim 1 88. (Previously presented) The positionally addressable array of claim 
1 , wherein the kinases are yeast kinases. 
Claims 189-192. (Canceled). 

Claim 193. (Previously presented) The positionally addressable array of claim 
1, wherein the different substances are 61 purified active kinases. 

Claim 194. (Currently amended) The positionally addressable array of claim 
193, wherein the kinases are mombors of th e serine/threonine kinase family members , 
mombors of the tyrosine kinase family members, or the kinases aio m o mbarG of the 
serine/threonine kinase family and m o mb o rs of the tyrosine kinase family members . 

Claim 195. (Currently amended) The positionally addressable array of claun 
1, wherein the functional kinase domains are functional kinase domains of members of 
the serine/threonine kinase family members , functional kinase domains of mombcrc of 
^ t>TOsine kinase family members , or wh o roin *he functional kinase domains comprise 
functional kinase domains of kinases that aro mcmbors of the serine/threonine kmase 
family members and functional kinase domains of kinaoos that a r o momborD of t h e 
tyrosine kinase family members . 

Claim 196. (Withdrawn) The positionally addressable array of claim 1, 
wherein the kinases or functional kinase domains are recombinant proteins. 

Claim 197. (Withdrawn) The positionally addressable array of claim 196, 

wherein the recombinant prc teins are r^^^^ - . 

Claim 198. (Canceled). 
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present application, and the currently pending claims, filed in the Reply to Office Action 
with this declaration. 

3. The 781 application presently claims a positionally addressable array 
comprising a plurality of different substances on a solid support, with each different 
substance being at a different position on the solid support, wherein the density of the 
different substances on the solid support is at least 100 different substances per cm^, and 
wherein the plurality of different substances comprises 61 purified active kinases or 
functional kinase domains thereof of a mammal, 61 purified active kinases or functional 
kinase domains thereof of a yeast, or 61 purified active kinases or functional kinase 
domains thereof of a Drosophila. 

4. In the Office Action at pages 7-12, the Examiner asserts that the claims 
are allegedly not enabled. Specifically, the Examiner asserts that the specification does 
not enable the claimed array comprising 61 kinases and functional kinase domains from 
a mammal, yeast or Drosophila. 

5. In making this declaration, it is my opinion that at the time this 
application was filed, an ordinary practitioner in the field of genomics and proteomics 
would have been able to make and use the presently claimed positionally addressable 
arrays based on knowledge available to those in the field in combination with the 
detailed disclosure of the 781 application. It is also my opinion that any experimentation 
required for making and using the presently claimed positionally addressable arrays 
would have been routine and thus not inordinate or excessive. 
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6. As discussed in detail below, the specification of the 781 application 

clearly provides sufficient disclosure for a typical practitioner in the field of proteomics 

to make and use positionally addressable arrays comprising 61 purified active kinases or 

functional kinase domains thereof of a mammal, 61 purified active kinases or functional 

kinase domains thereof of a yeast, or 61 purified active kinases or functional kinase 

domains thereof of a Drosophila. Based upon this disclosure, in combination with what 

was known at the time of filing of the present application, the use of kinases from 

various organisms, including mammals and Drosophila, in the preparation of the 

presently claimed positionally addressable anays, would not have required undue 

experimentation, but rather, routine and straightforward experiments. 

8. Protein kinases and functional kinase domains used in the positionally 
addressable arrays that form the basis of the present claims were, at the time this 
appiication was filed, ail weli-known, and well-characterized. See Hunter and Plowman, 
"The protein kinases of budding yeast: six score and more," TIBS 22:18-22 (1997) at 
page 18, first column, first paragraph (cited in Applicants' 6*^ SIDS submitted on April 
20, 2009), It was also well recognized at the time of filing of this appiication that 
kinases are highly conserved such that homologs exist between yeast and many other 
organisms. See Manning et aL, "The Protein Kinase Complement of the Human 
Genome," Science 2PS: 1912-1 934 (2002) at page 1913, first column, first paragraph 
(cited in Applicants' 6* SIDS submitted on April 20, 2009). Furthermore, the regulation 

— of"the different-kinases-and the-phosphorylation-motifs-Gf-substrates-T 

related kinases are often the same, indicating that they behave similarly biochemically. 
See id. Furthermore, as function is often highly conserved, human kinases can be 
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substituted for yeast kinases, illustrating the highly conservative nature of these proteins. 
See Lee and Nurse, "Complementation used to clone a human homologue of the fission 
yeast cell cycle control gene c€ic2," Nature 327: 31-35 (1987) (cited in Applicants' 6'^ 
SIDS submitted on April 20, 2009). Therefore, at the time this application was filed, the 
"state of the art" in protein kinases was such that a practitioner possessing a typical level 
of skill in proteomics, such skill including but not limited to gene cloning, protein 
expression and purification and analysis, would have readily recognized from the 781 
application, and the knowledge available in the art, that kinases of yeast, mammals and 
Drosophila could routinely be utilized to practice the presently claimed invention. 

9. The Examples set forth in the '781 application describe positionally 
addressable protein arrays made with purified, active kinases isolated from yeast. The 
specification describes that the purified, active yeast kinases were prepared by cloning 
yeast kinase genes into a high copy URA3 expression vector. See the 78! application at 
page 28, lines 28-30. The plasmids containing the vector sequences were transformed 
into yeast, and colonies were selected. Plasmids were rescued in E. coii, then 
transformed into the pep4 yeast strain for kinase protein purification. See id, at page 28 
line 36 to page 29 line 7. Purified, active kinases were attached to polydimethylsiloxane 
(PDMS) chips, and the chips comprising the purified, active yeast kinases were assayed 
for the phosphorylation of 17 different substrates to determine irt vitro kinase activity. 
See id. at page 33 line 14 to page 34 line 19. See id, at page 29 line 26 to page 30 
line 21. 

10. In 2000, while working for Molecular Staging, developing antibody based 
arrays, I read Dr. Michael Snyder's paper entitled "Analysis of yeast protein kinases 
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using protein chips." See Zhu et a!., "Analysis of yeast protein kinases using protein 
chips," Nature Genetics 26: 283-289 (2000). Dr. Snyder's paper described work that was 
extremely impressive and unexpected. Dr. Snyder's discoveries, in fact, motivated me to 
accept a position at Protometrix, Branford, CT, a protein array company that had 
licensed Dr. Snyder*s technology. When I joined Protometrix, work was already 
underway to use the information set forth in the 781 application, and known in the art at 
the time regarding protein kinases and their highly conserved homology between yeast 
and many other organisms, to develop protein arrays with active human kinases. 

11. Using the information set forth in the *781 application and the known 
homologies between human and yeast kinases, researchers at Protometrix were able to 
informatically identify genes for human kinases and utilize algorithms to identify many 
kinase functional domains. The genes were cloned into a recombinant bacmid 
(baculovirus shuttle vector), transfected into Sf9 insect cells, and cultured in 96 well 
plates. See Protein-Protein Interaction Profiling on Invitrogen ProtoArray*rM High- 
Density Protein Microarrays, Application Note, Invitrogen page 2, column 2, paragraph 
3 (2005) (hereinafter "Protein- Protein Interaction Profiling," Exhibit B). In many cases, 
efforts were made to clone the fxill length protein kinases as well as the kinase active 
catalytic domains. All the proteins were expressed as N-terminal glutathione-S- 
transferase (GST) fusion proteins. See id page 2, column 1, paragraph 1. 

12. After a growth period, the cells were harvested and lysed under 
nondenaturing conditions as in the 781 application. See Protein-Protein Interaction 
Profiling at page 2, column 3, paragraph 1. See the 781 application at page 29, 
line 8-16. The lysates were further loaded and eluted off of glutathione resin in 96 well 
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plates under nondenaturing conditions. See Protein-Protein Interaction Profiling at page 
2, column 3, paragraph 1 . Over 90% of protein kinases expressed and purified using the 
methods described in the 781 application were active as demonstrated by catalytic 
activity including autophosphoylation, wherein a protein kinase phosphorylates itself. 
See id. In contrast, only approximately 10% of protein kinases were active when other 
methods known at the time were utilized, including the use of high throughput methods 
with kinase expression in E. coli. 

13. Finally, using the purified kinases I have described, Protometrix 
developed a positionally addressable array comprising at least 100 different proteins on a 
solid support, with each different protein being at a different position on the solid 
support, wherein the density of the different proteins on the solid support was at least 
100 different proteins per cm^, and contained at least 61 purified active human kinases or 
functional kinase domains, as presently claimed in the 781 application. Protometrix's 
technology, sold as Invitrogen's Human ProtoArray High Density Protein Microarrays"^, 
are manufactured with thousands of different quality controlled recombinant human 
proteins and contain approximately 400 active human kinases and functional kinase 
domains. See B. Schweitzer et al. Development and Validation of Kinase Substrate 
Screening on Human ProtoArray High Density Protein Microarrays'^'^, Invitrogen, Inc., 
page 2, column 1, paragraph 2 to page 3, column 1, paragraph I (2004) (hereinafter 
"Schweitzer") (Exhibit C). See also Access to the Human Proteome on a Microarray 
Scale, Invitrogen, Inc., Tables I & 2 (2007) (hereinafter "Access to Human Proteome") 
(Exhibit D). Several commercial versions of this array have been sold with between 
1,500 to 9,000 human proteins. The activity of the arrayed kinases has been verified, 
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including demonstrated catalytic activity by incubating the arrays with radioactive ATP 
and measuring autophosphorylation. See Schweitzer, page 2, column 2, paragraph 1 to 
page 3, column 1, paragraph 1. 

14. It is my opinion that the 781 application provides a clear disclosure of 
how to make and use the presently claimed positionally addressable arrays. Those 
working in the field of proteomics at the time this application was filed were aware that 
yeast» mammalian and Drosophilia kinases are highly conserved and homologous, and 
therefore that the teachings of purified, active yeast kinase arrays in the 781 Application 
could be used to routinely prepare similar arrays with mammalian and Drosophilia 
kinases. Equipped with this information, researchers at Protometrix, using the 
information disclosed in the *781 application, were able to develop protein arrays 
comprising human kinases. See, e.g., the 781 application at pages 25-35. Thus, in my 
opinion, a typical practitioner in the field of proteomics would consider the production of 
an array using at least 61 purified, functional kinases from yeast as detailed in the *781 
application, to also allow for the routine production and use of arrays comprising 
purified, active kinase and functional kinases domains from other organisms, including 
mammals and Drosophila, as set forth in the presently claimed invention. 
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PROFESSIONAL EXPERIENCE 

INVITROGEN CORPORATION (Now LIFE TECHNOLOGIES), Carlsbad, CA 2004 - Present 

Director, Integrated Technologies, Molecular Biology Systenns Division -2009 to present 

Director, Protein Analysis R&D - 2008 

Director, Protein Array R&D and Site Leader - 2006 - 2007 

Director, Protein Array R&D and Operations - 2004 - 2006 

Cunent responsibilities include the oversight of programs which span the traditional segments of the Molecular .Biology Reagent 
Business, particularly programs that integrate instrumentation with consumables. Previous responsibilities included oversight of 
R&D and Services for Invitrogen^s Protein Analysis product lines, including protein separation technologies, Western technologies, 
mass spectroscopy, and protein arrays. Additional responsibilities included site leadership of the Protein Array Center in Branford, 
CT, including R&D, Services, Manufacturing, Quality, and Facilities functions. Other responsibilities include budget preparation and 
implementation, intellecUial property management, and oversight of academic, government, and industrial collaborations and 
contracts. Also participating in technology and intellectual property evaluations, business development, grant preparations, 
community relations, and presentations at national and international meetings. Reporting to the Vice President, R&D of the 
Molecular Biology Reagents Business Unit. 

Leadership accomplishments include: 

■ Led transfer of all operations from Branford, CT to Carlsbad on-time, under budget, and without loss of revenue 

■ Led global launch of several new multimillion dollar products 

■ Championed Lean Six Sigma Black Belt and Green Belt projects 

■ Led ISO 9001 Certification of Branford Site 

■ Led successful completion of multimillion dollar Biodefense projects in partnership with the United States Army 
Medical Research Instimte for Infectious Diseases (USAMRIID) 

■ Authored or co-authored 1 1 publications, including paper in Nature 

■ Inventor or co-inventor on 1 0 new patent applications 

■ Presented at 14 international scientific conferences. 

PROTOMETRJX, INC., Branford, CT 2002 - 2004 

Senior Director, Technology - 2003-2004 
Director of Technology - 2002 - 2003 

Fifth person to join start-up biotechnology company. Director of a research and development operation providing high-throughput 
gene cloning, protein expression, protein purification, and protein microarray manufacturing for products, services, and discovery. 
Additional responsibilities included leading product development teams, leading technology and intellectual property diligence 
reviews, presenting to investors, coordinating industrial collaborations, and managing prosecution of company intellectual property. 
Reported to the Vice President, R&D. 

Leadership accomplishments included: 

• Led the Piotometrix technical and IP diligence team during the acquisition of the company by Invitrogen Corp. 

■ Led the commercial launch of the world's first functional protein microarray product. 

■ Established the 1st manufacturing facility for the production of protein arrays. 

■ Built highly skilled team of scientists, engineers, and informatics specialists 

• Led the design and buildout of 14,000 s.f state-of-the-art laboratory and company headquarters. 
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MOLECULAR STAGING, INC.. New Haven, CT 1998 - 2002 

Director of Proteoniics - 2001 - 2002 
Section Head - 1998-2000 

Second person to join start-up biotechnology company. Director of a research and sei-vice operation providing high-throughput 
protein expression profiling data using proprietary protein microarray technology to academic, government, and corporate clients. 
Responsibilities included management of research personnel, budget preparation and implementation, business development, 
oversight of academic collaborators, preparation of publications and patent applications, presentations for investors, corporate 
partners and at national meetings. Four direct and 23 indirect reports. Reporting to Chief Operating Officer. 

Leadership accomplishments included: 

■ Successfully launched the world's first microarray-based protein expression profiling service. 

■ Developed the world's most advanced manufacturing facility for production of antibody microarrays. 

■ 8 publications, including publication in Nature Biotechnology of 1st application of antibody microarrays for protein 
expression profiling. 

■ 1 issued patent, and 4 patent applications. 

■ Led and coordinated the design and buildout of 46,000 s.f of state-of the-art proteomics laboratory. 

• Successfully moved an academic technology into an industrial setting, increasing sensitivity, robustness, and utility. 

■ Led project resulting in $9 MM equity investment by Fortune 100 Company. 

■ Gave technical presentations resulting in $40 MM 2nd round fmancing, 

WALT DISNEY MEMORIAL CANCER CENTER, Orlando, PL 1994 - 1998 

Division Director. Laboratory director of multidisciplinary research program in the structiu-al biology of nucleic acids, proteins, and 
dmgs involved in cancer and related diseases. Responsibilities included carrying out experiments and data analysis, project 
development, management of 15-20 research, administrative, and volunteer personnel, budget preparation and implementation, grant 
writing, preparation of publications, pubhc relations, and mentoring of graduate, undergraduate and high school students. 
Scientific Director Molecular Diagnostics Clinical Laboratory. Responsibilities included business plan preparation and 
implementation, management of technical staff, technical consultant, clinical research director, and physician outreach. 

Leadership accomplishments included: 

• Established and directed a program utilizing multidimensional nuclear magnetic resonance (NMR) spectroscopy, and 
computational chemistry to determine high-resolution structures of proteins, nucleic acids, and drug complexes for the 
puipose of chemotherapeutic development. 

■ Established and directed a laboratory utilizing the most advanced molecular techniques to diagnose infectious diseases, 
cancer, and inherited diseases for patients of Florida Hospital (2nd largest number of adnussions in U.S.). 

UNIVERSITY OF CENTRAL FLORIDA, Orlando, FL 1 994 - 1 998 

Assistant Professor 

Responsibilities included; Research, Florida Hospital liaison, committee service, mentoring of graduate and undergraduate students, 
taught courses in Principals of Modern NMR Spectroscopy, Special Topics in Drug Development, Advanced Biochemistiy 
Laboratory 

EARLIER POSITIONS: Associate Research Scientist (1991-1993), Yale University School of Medicine, and Research 
Associate (1990-1991), Memorial Sloan-Kettering Cancer Center 



OTHER EXPERIENCE 

GLYGENIX, INC., Cheshire, CT 2005 - 2007 

MetTit)er; Board 1 
(GSDl.) Its goal is to help find a cure for this disease by raising monies for GSDl -related research. 

THE EPISCOPAL CHURCH AT YALE, New Haven, CT 2000 - 2003 

Member, Board of Governors. The Episcopal Church at Yale (ECY) is a full time ministry of the Episcopal Church to students, 
staff and faculty at Yale.The ECY is governed by a Board of Governors of the Episcopal Church at Yale Corporation which is the 
legal entity of the Corporation in matters of contracts and other transactions with other institutions such as Yale University. 
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PUBLICATIONS 



Original Ariicies. 

1. Schweiteer, B. I., and Bacopoulos, N. G. (1983) Reversible decrease in dopaminergic 3H-agonist binding after 6- 
hydroxydopamine and irreversible decrease after kainic acid. Life Sciences 32, 531-541. 

2. Merker, M., Rice, J., Schweitzer, B., and Handschuniacher, R. E. (1983) Cyclosporine binding component in BW5187 
lymphoblasts and normal lymphoid tissue. Transplantation Proceedings 15, 2265-2270. 

3. DeReimer, S. A„ Schweitzer, B., and Kaczmarek, L, K. (1985) Inhibitors of calcium-dependent enzymes prevent the 
onset of afterdischarge in the peptidergic bag eel! neurons of Aplysia. Brain Research 340, 175-180. 

4. Srimatkandada, S., Schweitzer. B. L, Moroson, B. A., Dube, S., and Bertino, J. R. (1989) Amplification of a 
polymorphic dihydrofolate reductase gene expressing an enzyme with decreased binding to methotrexate in a human 
colon carcinoma cell hne, HCT8R4, resistant to this drug. Journal of Biological Chemistry 264, 3524-3528. 

5. Schweitzer, B. I., Srimatkandada, S., Gritsman, H., Sheridan, R., Venkataraghavan, R., and Bertino, J- R. (1989) 
Probing the role of two hydrophobic residues in the active site of the human dihydrofolate reductase by site-dnected 
mutagenesis. Journal of Biological Chernistry 264, 20786-20795. 

6. Dicker, A. P., Volkenandt, M., Schweitzer, B. L, Banerjee, D., and Bertino, J. R. (1990) Identification and 
characterization of a mutation in the dihydrofolate reductase gene from the methotrexate resistant Chmese hamster 
ovary cell line, Pro-3 MTXRIIL Journal of Biological Chemistry 265, 83 1 7-832 1 . 

7. Li, W. W„ Lin, J. T., Schweitzer, B. L, and Bertino, J. R. (1991) Mechanisms of sensitivity and naUiral resistance to 
antifolates in a methylcholanthrene-induced rat sarcoma. Molecular Pharmacology 40, 854-858. 

8. Li, W. W., Lin, J. T., Chang, Y. M., Schweitzer, B. L, and Bertino, J. R. (1991) Prediction of antifolate efficacy in a rat 
sarcoma model. International Journal of Cancer 49, 234-238. 

9. Lin, J. T., Tong, W. P., Trippett, T. M., Niedzwiecki, D., Tao, Y.. Tan. C, Steinherz, P., Schweitzer, B. L, and Bertino. 
J. R. (1991) Basis for natural resistance to methotrexate in human acute non- lymphocytic leukemia. Leukemia 
Research 15, 1191-1196. 

10. Sapse, A.-M., Schweitzer, B. I., Dicker, A. P., Bertino, J. R., and Frecer, V. (1992) Ab initio studies of aromatic- 
aromatic and aromatic-polar interactions in the binding of substrate and inhibitor to dihydrofolate reductase. 
International Journal of Peptide and Protein Research 339, 18-23. 

11. Trippett, T., Schlemmer, S., Elisseyeff. Y., Wachter. M.. Steinherz, P., Berman, E., Rosowsky, A., Schweitzer, B.. and 
Bertino, J. R. (1992) Evidence for defective transport as a mechanism of acquired resistance to MTX in patients with 
acute lymphocytic leukemia. Blood 80, 1 158-1 162. 

12. Li, W. W., Lin, J. T., Schweitzer, B. I., Tong. W. P., Niedzwiecki, D.. Brennan, M. F., and Bertino, J. R (1992) 
Intrinsic resistance to methotrexate in human soft tissue sarcoma cell lines. Cancer Research 52, 1-6. 

13. Li, M.-X., Hantzopoulos, P. A., Banerjee, D.. Zhao, S. C, Schweitzer, B. I., Gilboa, E., and Bertino, J. R. (1992) 
Comparison of the expression of a mutant dihydrofolate reductase under control of different internal promoters in 
retroviral vectors. Human Gene Therapy 3, 381-390. 

14. Volkenandt, M., Dicker, A, P., Banerjee, D., Fanin, R., Schweitzer, B. I., Horikoshi, T., Danenberg, K, Dancnberg P.. 
and Bertino, J. R. (1992) Quantitation of gene copy number and mRNA using the polymerase chain reaction. 
Proceedings of the Society for Experimental Biology and Medicine 200, 1-6. 

15. Fanin. R., Banerjee, D., Volkenandt, M., Waltham, M., Li, W. W., Dicker, A. P., Schweitzer, B. I., and Bertino, J. R- 
(1993) Mutations leading to antifolate resistance in Chinese hamster ovary cells after exposure to the alkylating agent 
ethylmethanesulfonate. Molecular Pharmacology 13-21. 

16. Goker. E.. Lin, J. T., Trippet, T., Elisseyeff, Y., Tong, W. P., Niedzwiecki, D., Tan, C, Steinherz, P., Schweitzer, B. I„ 
and Bertino, J. R. (1993) Decreased polyglutamylation of methotrexate in acute lymphoblastic leukemia blasts m adults 
compared to children with this disease. Leukemia 7, 1000-1004. 

17. Li, W. W., Waltham, M., Tong, W, Schweitzer, B. I., and Bertino, J. R. (1993) Increased activity of gamma-gluiamyl 
hydrolase in human sarcoma cell lines; a novel mechanism of intrinsic resistance to methotrexate (MTX). Advances in 
Experimental Medicine and Biology 338, 635-638. 

18. Kellogg, G. W., and Schweitzer, B. I. (1993) Two- and three-dimensional 3 IP-driven NMR procedures for complete 
assignment of backbone resonances in oligodeoxyribonucleotides. Journal of Biomolecular NMR 3, 577-595. 

19. Dicker, A. P., Waltham, M., Volkenandt, M., Schweitzer, B. I., Otter, G. M., Schmid, F. A., Sirotnak, F. M., and 
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Bertino, J. R. (1993) Methotrexate resistance in an in vivo mouse tumor resulting from a novel mutation in the 
dihydrofolate reductase gene. Proceedings of the National Academy of Sciences (USA) 90, 1 1797-1 1801. 

20. Ercikan, E., Waltham, M., Dicker, A., Schweitzer, B., and Bertino, J. R. (1993) Effect of codon 22 mutations on 
substiate and inhibitor binding for human dihydrofolate reductase. Advances in Experimental Medicine and Biology 
338,515-519. 

21. Banerjee, D., Schweitzer, B. I., Volkenandt, M., Li, M.-X., Waltham, M., Mineishi, S., Zhao, S.-C-, and Bertino, J, R. 
(1994) Transfection with a cDNA encoding a Ser31 or Ser34 mutant human dihydrofolate reductase into Chinese 
hamster ovary and mouse marrow progenitor ceils confers methon:exate resistance. Gene 139, 269-274, 

22. Li, M.-X., Banerjee, D., Zhao, S.-C, Schweitzer, B. L, Mineishi, S., Gilboa, E., and Bertino, J. R. (1994) Dcvelopmeni 
of a retroviral construct containing a human mutated dihydrofolate reductase cDNA for hematopoietic stem cell 
transduction. ^/c?o^/ 83, 3403-3408. 

23. Zhao, S.-C, Li, M.-X., Banerjee, D., Schweitzer, B. I., Mineishi, S., Gilboa, E., and Bertino, J. R. (1994) Long-term 
protection of recipient mice from lethal doses of methotrexate by marrow infected with a double-copy vector retrovmis 
containing a mutant dihydrofolate reductase. Cancer Gene Therapy 1, 33-39. 

24. Schweitzer, B. I., Mikita, T., Kellogg, G. W., Gardner, K. H., and Beardsley, G. P. (1994) Solution structure of a DN A 
dodecamer containing the antineoplastic agent cytosine arabinoside: Determination by two- and three-dimensional 
NMR, restrained molecular dynamics, and complete relaxation matrix analysis. Biochemisny 33, 11460-1 1475. 

25. Schweitzer, B. L, Gardner, K. H., and Tucker-Kellogg, G. (1995) HeteroTOCSY-based experiments for measuring 
heteronuclear relaxation in nucleic acids and proteins. Journal of Biomolecular NMR 6, 180-188. 

26. Marshalko, S. J., Schweitzer, B. L, and Beardsley, G. P. (1995) Chiral chemical synthesis of DNA containing (S)-9- 
(l,3-dihydroxy-2-propoxymethyl) guanine (DHPG) and effects on thermal stability, duplex structure, and 
thermodynamics of duplex formation. Biochemistry 34, 9235-9248. 

27. Callihan, D., West, J.. Schweitzer, B, L, and Logan, T. M. (1996) Simple, distortion-free homonuclear spectra of 
peptides and nucleic acids in water using excitation sculpting. Journal of Magnetic Resonance, Series B, 1 12, 82-86. 

28. Erickan, A. E., Waltham, M., Dicker, A. P., Schweitzer, B. I., Gritsman, H., Banerjee, D,. and Bertino, J. R. (1996) 
Variants of human dihydrofolate reductase with substitutions at leucine-22: effect on catalytic and inhibitor bmding 
properties. Molecular Pharmacology 49, 430-437. 

29. Foti, M., Marshalko, S. J., Beardsley, G. P., and Schweitzer, B. I. (1997) Solution stnicture of a DNA decamer 
containing the antiviral agent ganciclovir: Determination by two-dimensional homonuclear and heteronuclear NMR, 
restrained molecular dynamics, and complete relaxation matrix analysis. Biochemistry 36, 5336-5345. 

30. Kumar, S., Reed, M., Gamper, H., Gom, V., Lukhtanov, E., Kutyavin, L, Meyer, R., and Schweitzer, B. I. (1997) 
Solution structure of a DNA decamer conjugated on the 5' end with CDPI3. Nucleic Acids Research 26, 831-838. 

31. Shao, W., Jerva, L. F., West, J., Lolis, E., and Schweitzer, B. L (1998) Solution strucUire of murine macrophage 
inflammatory protein-2 (MIP-2). Biochemistry 37, 8303-8313. 

32. Shao, W., Fernandez, E., Wilken, J., Thompson, D. A., Siani, M. A., West, J., Lolis, E., and Schweitzer, B, I. (1998) 
Accessibility of selenomethionine proteins by total chemical synthesis: strucUirai studies of human herpes virus-8 MIP- 

11. FEBS Letters 441,17-^2. 

33- Schweitzer, B. 1., Foti, M., Keertikar, K., Kumar, S-, Gardner, K. H., and Tucker-Kellogg, G. (1999) The use of ^'P 
relaxation experiments to probe the effects of nucleoside analogs on DNA dynamics. Phosphoi-us, Sulfur, and Silicon 

12, 36-41. 

34. Foti, M., Omichinski, J. G., West, J., Stahl, S., and Schweitzer, B. I. (1999) Effects of nucleoside analog incorporation 
on DNA interactions with the DNA binding domain of the GATA-1 erythroid transcription factor. FEBS Letters 444, 
47-53. 

35. Schweitzer. B., Wiltshire, S., Lambert. J., O'Malley, S., Kukanskis, K., Zhu, Z., Ward, D., and Kingsmore, S. F. (2O0O) 
"IPrtiHunoassays'^vit^^^^ 

Proceedings of the National Academy of Sciences (USA) 97, 10113-10119. 

36. Wiltshire, S., O^Malley, S., Lambert, J., Kukanskis, K., Edgar, D., Kingsmore. S. F., and Schweitzer, B.I. (2000) 
Detection of multiple ailergen-specific IgE on microarrays by immunoassay with rolling circle amplification. Clinical 
Chemistfy 46, 1990-1993. 

37. Shao, W., Fernandez, E., Wilken, J., Thompson, D. A., Schweitzer, B. L, and Lolis, E. (2001) CCR2 and CCR5 
receptor-binding properties of herpesvirus-8 vMIP-11 based on sequence analysis and its solution structure. European 
Journal of Biochemistry 268, 2948-2959. 
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Gusev Y, Sparkowski, J., Ferguson, H.. Montano, J., Visconti, R., Bogdan, N., Schweitzer, B., Wiltshire, S., 
Kingsmore, S., Maltzman. W., and Wheeler, V. (2001) Rolling circle amplification - A new approach to increase 
sensitivity for inununohistochemistry and flow cytonieuy. Journal of American Pathology 159, 63-69. 

39. MuUenix, M. C, Wiltshire. S., Shao, W., Kitos, G., and Schweitzer, B. (2001) Allergen-specific IgE detection on 
microarrays using rolling circle an^lification: correlation with in vitro assays for serum IgE. Clinical Chemistiy 47, 
1926-1929. 

40. Nallur, G., Luo. C, Fang, L., Dave, V.. Lambert, J., Lasken, R.. Kingsmore, S., and Schweitzer, B. (2001) Signal 
anq)lification by rolling circle anqjlification on DNA microarrays. Nucleic Acid Research 29, el 18. 

41. Schweitzer, B., Roberts, S.. Grimwade, B., Shao. W., Wang, M., Fu, Q., Shu, Q., Laroche. I.. Christiansen, Velleca, 
xM. and Kingsmore, S. F. (2002) Multiplexed protein profiling on microarrays by rolling circle amplilication. 
Application to dendritic cell cytokine secretion. Nature Biotechnology 20, 259-365. 

42 Nallur, G.. Marrero, R., Luo, C, Krishna, R., Bechtel, P., Shao, W., Ray, M.. Wiltshire. S, Fang, L, Huang, H Liu C.- 
G., Sun, L., Sawyer, J., Kingsmore, S., Schweitzer, B., and Xia, J. (2003) Protein and nucleic acid detection by rolling 
circle amplification on gel-based microarrays. Biomedical Microdevices 5, 1 15-123 

43. Michaud, G, Salcius, M.. Zhou. F.. Banghatn. R., Bonin, J., Guo. H.. Snyder. M., Predki, P., and Schweitzer, B. (2003) 
Analyzing antibody specificity with whole proteome microarrays. Nature Biotechnology 21, 1509-1512. 

44. De Yang, Q., Rosenberg. H., Rybak, S.. Newton. D., Wang, Z., Fu, Q., Tchemev, V., Wang. M Schweitzer B.. 
Kingsmore, S., Patel. D., Oppenheim, J., and Howard. 0 (2004) Human ribonuclease A superfamily members 
eosinophil-derived neiuotoxin (EDN) and pancreatic ribonuclease (hPR), induce dendritic cell maturation and 
activation. Journal of Immunology 173, 6134-6142. 

45. Predki. P.F., Mattoon. D., Bangham, R., Schweitzer. B., and Michaud, G. Protein microarrays: A new tool for profiling 
antibody cross-reactivity. Human Antibodies. 2005, 14:7-15. 

46. Ptacek, J.. Devgan, G., Michaud. G., Zhu, H.. Zhu, X., Fasolo. J.. Guo, H.. Jona, G.. Breitkreutz, A., Sopko, R., 
McCartney. R., Schmidt, M.. Rachidi, N.. Lee. S.-J.. Mah, A., Meng, L., Stark, M.. Stern. D., De Virgiho, C Tyers, 
M., Andrews, B.. Gerstein. M.. Schweitzer, B.. Predki. P., and Snyder., M. (2005) Global analysis of protein 
phosphorylation in yeast Nature 438, 679-84. 

47. Boyle, S.N.. Michaud, G.A.. Schweitzer B., Predki. P.F.. and Koleske. A.J. (2007) A critical role for cortactin 
phosphorylation by Abl-family kinases in PDGF-induced dorsal-wave formation. Current Biology 17, 445-51. 

48. Meng, L.. Michaud. G.A., Merkel. J.S., Zhou. F., Huang. J.. Mattoon, D.R.. and Schweitzer, B. (2008) Protein kinase 
substrate identification on functional protein anays. BMC Biotechnology. 8, 22-30. 

49. Schmid, K., Keasey, S.. Pittman. P., Emerson, G.L.. Meegan. J.. Tikhonov, A.P.. Chen. G., B. Schweitzer, B., and 
Ulrich, R.G. (2008) Analysis of the human immune response to vaccinia by use of a novel protem '™=fo/'!^y ysts 
that antibodies recognize less than 10% of the total viral proteome. Proteomics Clinical Applications 2, 1528-1538. 

50 Sboner. A., Karpikov, A.. Chen. G., Smith, M., Mattoon, D.. Freeman-Cook, L., Schweitzer, B., Gerstein, M B. (2009) 
Robust linear model normalization to reduce technical variability in functional protem microarrays. Journal o} 
Proteome Research 8, 5451-5464. 

51 Keasey S L.. Schmid, K.E.. Lee. M.S. Meegan, J., Tomas. P., Minto. M., Tikhonov, A.P., Schweitzer. B.. Ulrich. R.G, 
(2009) Extensive antibody cross-reactivity among infectious gram-negative bacteria revealed by proteome microarray 
analysis. Molecular and Cellular Proteomics 8, 924-935. 
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iNVITROGEN | APPMCATJON NoTK | PrOTOARRAY " HiGH-DENSlTY PhOTEIN MiCROARRAYS 



Protein-Protein Interaction Profiling on Invitrogen ProtoArray'" 
High-Density Protein Microarrays 

A powerful means of determining the function of a protein is to map its interactions with other proteins. A variety of approaches 
are available to study protein-protein interactions, including mass spectroscopy, and yeast two-hybrid methods (1). Yet these tech- 
nologies have several drawbacks: they are time-consuming, require expensive and specialized equipment as well as considerable 
expertise to run the equipment, and utilize large amounts of sample. Several large-scale efforts to map protein-protein Interactions 
using mass spectroscopy or yeast two-hybrid have been performed recently (2, 3). Interestingly, a comparison of the results of these 
studies shows little overlap between the interactions observed in each, suggesting that the accuracy or the coverage of the methods 
may be lacking (4). 

Protein microarrays have introduced a new approach to identify and characterize protein interactions, providing the ability to 
rapidly identify new interactions between thousands of proteins in a single experiment (5). Since the location and identity of each 
protein on the array is known, interaction maps can be developed rapidly from iterative probings of protein arrays. Because a pro- 
tein microarray experiment is performed within a day, and interactions are assessed in the context of thousands of other proteins, 
interaction profiling on microarrays can greatly accelerate the rate at which novel protein interactions are discovered. Additionally, 
the in vitro nature of protein microarray experiments permits control over probing conditions that affect protein interactions such 
as protein concentration, post-translational modifications, and presence of cofactors. which may not be possible with other methods 
such as yeast two-hybrid screening. 

MacBeath and Schreiber were among the first to demonstrate the potential of protein microarrays in protein-protein interaction, 
biochemical, and drug binding studies. In this study, pairs of proteins that were known to interact with each othcr-protcin G and 
the immunoglobulin (IgG), p50 and IkKu, and the FKBP12 binding domain of FKBP with the human immunophilin FKBP12-were 
shown to interact on protein microarrays (6). Although this study represented a critical milestone in the development of functional 
protein arrays, only a few proteins were analyzed and novel activities were not identified. Since this report, a series of publications 
have demonstrated that proteins can retain their expected interactions while immobilized on microarray surfaces. Espejo etal. dem- 
onstrated that protein interaction domains, such as Src homology (SH2), i4..3.3, forkhead-associated (FH A), PD2, pleckstrin homol- 
ogy (PH), and FF domains arrayed onto nitrocellulose-coated microarrays retain function and specificity, interacting with their 
corresponding ligands (7). Newman and Keating have used microarrays to characterize binary coiied-coil interactions from human 
basic-region leucine /ipper transcription factors (S^N^^^ 

interactions among several human DNA replication initiation proteins (9). Finally, in what may be the most striking example of the 
power of protein microarrays. Michael Snyder and colleagues at Yale University reported the fabrication of an array containing the 
majority of proteins from the yeast proteome and the use of this array to identify a new binding motif for calmodulin (10). 
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Invitrogen ProtoArray"* Products 

Invitrogen has recently introduced the 
Proto Array" Microarray Technology for study- 
ing molecular interactions on protein arrays. The 
ProloArray" products incUide the ProtoArray" 
Yeast Protcome Microarray nc vl.O, which con- 
tains 4088 open reading frames (ORFs) jfrom 
Saccharomyces cerevistae, and the Proto Array" 
Human Protein Microarray nc vl.O, which con- 
sists of nearly 1,900 human proteins. All pro- 
teins are expressed as N-termlnal glutathione- 
S-transfera.<;e (GST) fusion proteins, purified, 
and spotted in duplicate on nitrocellulose-coated 
1 inch X 3 inch glass slides. Using ProtoArray" 
Microarrays allows screening of target proteins 
of interest for interaction with thousands of pro- 
teins in as little as four hours. Detection on the 
arrays is sensitive— as little as 1 pg protein on the 
array can be detected with submicrogram quanti- 
ties of probe protein—and reproducible. 

To detect protein-protein interactions on 
ProtoArray'* Microarrays, the protein probe must 
contain a label or tag to visualize the interaction of 
the probe with array proteins. The extremely high 
affinityofthebiotin-streptavidin interaction makes 
bioLin-protein conjugation a preferred method for 
protein labeling. Invitrogen otters the ProtoArray" 
PPI Complete Kit for biotinylated proteins, which 
contains a module for efficiently biotinyUting small 
amounts of a protein as well as qualified reagents 
for blocking, washing, and detecting biotinylated 
protein probes with strcptavidin conjugated to a 
fluorescent dye, Alexa Fluor* 647. 

Another preferred method of detecting protein 
interactions on Pmto Array" Microarrays is to 
use protein probes with an epitope tag and a 
labeled antibody against the tag. An example 
of such a tag is the V5 epitope, a 14 amino acid 
(GKPIPNPLLGLDST) epitope derived from the 
P and V proteins of the paramyxovirus SV5. 
Invitrogen offers several Gateway' expression vec- 
tors that allow the fusion of the V5-tag to a pro- 
tein of interest. The ProloArray" PPI Complete 
Kit for epitope -tagged proteins from Invitrogen 
provides reagents for blocking, washing, and 
delecting a V5-tagged protein using an Anii-V5- 



Alcxa Fluor' 647 Antibody developed specifically 
for this application. 

This Application Note demonstrates the util- 
ity of Yeast and Human ProtoArray" Protein 
Microarrays for detecting protein-protein Inter- 
actions using the biotinylated or epltope-tagged 
protein probes. 

Materials and Methods 

Yeast Prottame collection: The yeast protcome 
collection was derived from the yeast clone col- 
lection of yeast ORFs generated by the Snyder 
laboratory as described by Zhu et aL (10). Each 
S, cerevisiae open reading frame (ORF) was 
expressed as an N-terminal GST-6xHis fusion 
protein In a yeast expression vector. The Identity 
of each clone was verified using 5'-end sequenc- 
ing and the expression of GST-lagged fusion pro- 
tein by each done was confirmed with Western 
immunodetection using an anti-GST antibody. 
After verifying that each clone expresses a protein 
of the expected molecular weight, the proteins 
(from 4.088 clones) were expressed and purified 
using high-throughput procedures (10). 

Jlumarj protein collection: The majority of the 
human protein collection is derived from the 
human Ultimate" ORF Clone Collection available 
from Invitrogen (see http://orf.invitrogcn.com 
for more Information). The human proteins 
were expressed in the Bac-to-Bac* Baculovirus 
Expression System (Invitrc^en Cat. no. 10359-016, 
for more information on the Bac-to-Bac* Baculovirus 
Expression Systecn, visit www.invitrogen.com). 
Each Ultimate" ORF Clone (entry clone) consists 
of a human ORF cloned into a Gateway* entry 
vector. Each entry clone was subjected to an l.R 
reaction with the Gateway* destination vector, 
pDEST~20 to generate an expression clone. The 
LR reaction mix obtained after performing the 
LR reaction was transformed into competent 
DHlOBac" B, coU to generate a recombinant bac- 
mid. The high molecular weight recombinant 
bacmid DNA was isolated and tran.sfcctcd into 
Sf9 insect cells to generate a recombinant bacu- 
lovirus that was used for preliminary expression 



experiments. After the baculoviral stock was 
amplified and titered. the high-titer stock was."t 
used to infect Sf9 Insect cells for expression of the 
recombinant protein of interest in 96 deep-well 
plates. Following a 3-day growth, the insect cells 
were harvested for purification. All steps of the 
purification process including cell lysis, binding 
to affinity resins, washing, and elutlon. were car- 
ried out at 4'C. Insect cells ore lyscd under non- 
denaturing conditions and lysates were loaded 
directly into 96-welt plates containing glutathi- 
one resin. After washing, purified proteins were 
eluted under conditions designed lo obtain native 
proteins. After purification, samples of the puri- 
fied proteins were run in SDS-PAGE gels and 
immunodetected by Western blot. The gel images 
were processed to generate a table of all the pro- 
tein molecular weights detected for each sample. 

ProtoArray" manufacturing: The protein puri- 
fication process described above produces thou- 
.sands of purified proteins ready to be printed on 
arrays. A contact- type printer equipped with 48 
matched quill-type pins is used to deposit each 
of these proteins along with a set of control ele- 
ments in duplicate spots on 1" x 3" glass slides. 
The printing of these arrays is performed in a cold 
room under dust- free conditions to prcsei-ve the 
Integrity of protein samples and printed micro- 
arrays. Before releasing the protein microarrays 
for use. each lot of arrays Is subjected to a rigor- 
ous quality control procedures, including visual 
Inspection of all the printed arrays to check for 
scratches, fibers, smearing, etc. To control for the 
quality of the printing process, several microar- 
rays are probed with an antt-GST antibody. Since 
each protein contains a GST fusion tag, this 
procedure measures the variability in spot mor- 
phology, the number of missing spots, the pres- 
ence of control spots, and the amount of protein 
deposited in each spot. 

Cloning, Expression, and Purification of Proteins 
(exfliS'VS-Bio£ase'*'EK'protelnfiisions):\.}\\.'irntiie'' 
ORF clones were obtained as entry clones and L 
X R cloned into pETl0& for expression in £. coti. 
For each ORF, plasmid DNA was transformed into 
Bl,21 Star" (DE3) E, coti cells, which were plated on 
l.B/Amp and grown overnight at 37'C. Several colo- 
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nics from each of the 12 cor^structs were picked 
from LB/Amp plates and transferred into 50 m) of 
LB Amp. Cultures were grown from 5 to 7 hours 
at 37*C until an OD*oo of 0.5 to 0.6 was reached. 
Next, 50 pi of 0.1 M IPTG was added to give a 
final concentration of 100 nM, and these cultures 
were incabaicd overnight at 20'C. Cell lysates 
were prepared using the protocol described in the 
I'roBond" Purification Resin naanuat. Pellets were 
resuspended with 3 ml Native Binding Buffer; 
8 mg lyso2/me was added and lysed lor 30 min- 
utes on ice. Cells were then sonicated on ice 
with six lO'second bursts and then centrifuged at 
3,500 rpm for 20 minutes, l.ysatc (8 ml) was loaded 
onto a column with 2 ml washed ProBond" resin 
and incubated for 1-2 hours at 4"C. The column 
was washed with Native Wash Buffer followed by 
an elution with 10 ml Elution Buffer. The pooled 
fractions were dialyied twice against 2 I. PBS. 
All samples were concentrated on Millipore spin 
membrane cartridges (10.000 MW cut-ofO to a 
final volume of 250-:i50 ^1, and were brought to 5% 
glycerol by the addition of an appropriate amount 
of 100% glycerol. Samples were then quick-frozen 
in liquid nitrogen and stored at ~80*C. 

//J vitro biotinylation of proteins: Human calmod- 
ulin (Upstate) was biotinylatcd using the protocol 
outlined in the ProtoArray" iVtim-Biotinytation 
Kit (Invltrogen). Briefly, protein was biotinylatcd 
at room temperature for 1 hour and the sample 
was applied to a gel filtration column to remove 
unincorporated biotin. Protein concentration and 
the extent of labeling was also asse.ssed. 

Alexa Fluor* 647'Streptavidin based detection^'Thfi 
protein-protein interaction assay was performed 
u-sing the protocol outlined in the Proto Array" 
PPI Complete Kit for biotinylatcd proteins 
(Invitrogen). Arrays were blocked with 1% BSA/ 
PB.S'r at 4'C for 1 hour. Proteins were diluted 
in probe buffer (IX PBS, 5 niM MgClj, 0,5 mM 
DTi: 5% glycerol, 0,05%_TrUon X- 100,^1 BSA) 
to 5 or 50 ng/pl and added to arrays tinder a cover 
slip, Hybrislip {Included in the kit). Proteins 
were incub*iled at 4'C for 90 minutes in n 50 ml 
conical tube and then transferred to an incuba- 
tion/hybridization chamber (included with the 
kit). Array.s were washed three times with probe 



buffer. Subsequently, a solution of Alexa Fluor" 
647-streptavidin (Invitrogen, 0.75 pg/mt) In probe 
buffer was added and incubated at 4*C for 30 min- 
utes. Arrays were washed three times and dried. 

Anti'VS'Alexa Fluor' 647 based detection: The 
protein-protein interaction assay was performed 
using the protocol outlined in the ProtoArray" 
PPI Complete Kit for epilope-tagged proteins 
{Invltrogen). Arrays were blocked with 1% BSA/ 
PBST at 4'C for 1 hour. Proteins were diluted in 
probe buffer to 5 or 50 ng/pl and added to arrays 
under a Hybrislip cover slip. Proteins were incu- 
bated at 4'C for 90 minutes In a 50 ml conical tube 
and then transferred to an incubation/hybrid- 
ization chamber (included in the kit). Arrays 
were washed three times with probe buffer. 
Subsequently, a solution of anti-V5-Alexa Fluor' 
647 conjugated antibody (Invitrogen/ 0.25 ng/ml) 
was added and incubated at 4'C for 30 minutes. 
Arrays were washed three times and dried. 

Data acquisition/analysis: The microarray was 
scanned with a GencPix* 4000B Fluorescent 
Scanner (Molecular Devices). Data was acquired 
with GenePix' Pro software (Molecular Devices) 
and processed using ProtoArray" Prospector 
{a software tool developed by Invitrogen that 
automatically performs data analysis, see 
www.invitrogcn.com/protoarray for details) 



or Microsoft Excel and Microsoft Access. 
Statistically significant signals on each protein 
array were identified. The .significant signals are 
greater than or equal to a value that is determined 
by calculating the median plus three .standard 
deviations (using signal minus background values 
for all non-control proteins) for all non-control 
proteins on the array. Interactors were defined 
as proteins having positive .significance calls not 
observed on the appropriate negative control. 

Results 

ProhingProtoArray" Yeast ProteomeMicroarrays 
with biotinylated yeast proteins: Four yeast pro- 
teins were biotinylatcd (« vitro using the invitrogen 
ProtoArray" Mini Biotinylation Kit. As shown in 
Figure 1, all four proteins showed expected inter- 
actions whett used to probe the ProtoArray" Yeast 
Proteome Microarray and detected with Alexa 
Fluor' 647-Streptavidin Conjugate. Each of the 
identified interactions is well annotated in the lit- 
erature using a variety of different approaches (see 
http://www.yeastgenome.org for further details). 
Note that the interactions shown in Figure 1 
are reciprocal. Biotinylated Ybrl09C (calmodu- 
lin) interacts with Yfr0I4C (calmodulin kinase) 
on the array, and biotinylatcd YfrOUC interacts 
with Ybrl09C on the array; the same relation- 
ship Is observed with the GTP binding protein 




Yir074W 



Figure l-Probing the ProtoArray" ycft« Proteomc Microarray with in vitro biotinylatcd ywst proteins. Subarrays 
show w pected i ntcracdons with biotinylated yeast proteins. Proteins were concentrated to 250 yig/ml and biolmylaied using 
Uie ProtoArray* Mini- Biotinylation Kit. 
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Gspl (Ylr293C) and the nuclear transport protein 
Mogl (Yjr074W). The reciprocal interactions are 
important for demonstrating the validity of the 
observed interactions and the functionality ofthe 
proteins on the array. 

Probing ProtoArray" Human Protein Microarrays 
with Biotinylated and Epitope-tagged Human 
Proteins: To aw ess the utility of human protein 
arrays and protein-protein interaction detection 
technologies optimized at Invitrogen for dem- 
onstrating protein-protein interactions, proteins 
containing both a single biotln and a V5 tag were 
prepared (see Materials and Methods). Several 
N-terminai fusions of V5-BioHase'" human pro- 
teins were probed against human protein arrays 
(ProtoArray" Human Protein Microarray nc 
vl.O) consisting of approximately 1,900 purified 
human proteins spotted in duplicate on a nitro- 
cellulose-coated glass slide. After probing the 
array with calmodulin 2 (CALM2), we observed 
that CAI.,M2 interacted with several proteins on 
the array. Most notable are the interactions with 
calclum/calmodulin- dependent protein kinase IV 
(CAMK4) and calcium/calmodulin-dependent 
protein kinase I (CAMKl) (Figure 2). These 
interactions were observed when streptavldln 
(data not shown) or antl-V5 based detection was 
used (Figure 2). We also used the PcotoArray" 
Mini-Btotlnyiation Kit to in vitro biolinylate 

B 



recombinant human calmodulin and used this 
protein to probe the ProtoArray~ Human Protein 
Microarray nc vl.O. As shown in Figure 3, similar 
protein interactions with CAMKl and CAMK4 
were observed for in vitro biotinylated calmodu- 
lin as with the Bio Ease"- tagged CALM2, dem- 
onstrating that valid protein-protein interaction 
data can be obtained by using proteins that are 
biotinylated using In vitro or in vivo methods. 

To demonstrate the utility and ease of use of 
ProtoArray" Technology for identifying novel 
protein-prolcin interactions, a VS-BioEasc" 
fusion to the protein cyclln-dependent kinase 
inhibitor IB (CDKIMB. p27. Kipl) wa.s used to 
probe a ProtoArray" Human Protein Microarray. 
We identified a specific interaction with cyclin- 
dcpendent kinase 7 (Cdk7, M015 homolog. 
Xersopus laevis. cdU-activating kinase) (Figure 4). 
The same interaction was also observed using 
streptavldin-based detection (data not shown). 
Although this interaction has not been reported 
previously in the literdlure, an Interaction of 
CDKINB with Cdk:^ has been reported, and it has 
been proposed thatretinolc acid induces cell cycle 
arrest in tumor cell lines by promoting formation 
of this complex (U). To validate the Interaction, 
we performed the following reciprocal protein- 
protein interaction assay: CDKINB was spotted 
on a nitrocellulose coated slide, then probed 



-4 figure 2— ProtoArray" Human Protein Micniarray 
nc vl.O probed with Human CALM2. Interactions de- 
tected with anti-V5-Alexa Fluor" 647 Dye. 

Panel A Whole slitleiinagc 

Panel B Interaction of CALM2 with CAMK4. Signals 
from Mexa Fluor' Antil>ody and V5 control ace shown. 
Alexa Fluor' labeled antibody is In every subarray aad 
u&ed a.s a reference marker for aligning the data acquisi- 
tion grid. The V5 control tj a V5 tagged protein printed 
on the slide. Signal with this protein indicates that assay 
detection is functioning properly. 



► Figure 3— ProtoArray" Human Protein Microar- 
ray nc vl.O probed with i» vitro biotinylated human 
calmodulin. Internclions detected with streptavldln 
Alexa Pluor* 647 DyC. 
Pantl A Whole slide image. 

Interactions of human calmodulin with CAMK4(/»art</ B) 
and CAMKl [PaftelC). Signals Uonx Alexa Fhior" Anti- 
body and biotinylated antibiKiy gradient ure shdwn. The 
biotinylated antibody gradient is used as assay detection 
control. Signal with this protein indicates that asso]^ de- 
tection Is functioning properly. 



with GST-tagged Cdk7, and the Cdk7-CDKINB 
complex was detected using an anti-GST anti- 
body. Similar probings with 18 other GST-tagged 
proteins gave signals with the spotted CDKINB 
that were on average approximately 10-fold lower 
than Cdk7 (Figure 5). Indicating that the Cdk7- 
CDKINB Interaction is quite specific. 

Summary 

ProtoArray*" Protein Microarrays with Alexa 
Fluor' detection technologies are optimized to 
quickly identify novel protein -protein interac- 
tions. High-quality reagents, protocols and tech- 
nical support are available. Consult the Invitrogen 
website for the latest information regarding pro- 
tein microarrays for protein interaction profiling 
using ProtoArray" Technology, 
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Figure t-ProtoArray- Human Protein Microarray nc vLO probed with CDKNIB. Interactions detected with anti- 
V5-Alejta Fluor' 647 Dye. Panel A Whole slide Image. Panel 8 Interaction of CDKNIB with CI)K7. Signals from Alexa 
Fluor* Antibody and V5 control arc shcu-n. 
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Protein Probes 

Figure 5-Reciprocal Protein Interaction Assay. Nineteen GST-fusions were expressed in Sf9 celts, purified u."iing 
glutathione chromatography, and probed against an array containing immobilized CDKNIB. The Y-axis is the signal 
background value for the CDKNl spot for each protwn pri>bed (X-«xis) against the array. The accession numbers (MOC 
or RcfScq) for the protein probes are listed (X-axis). The MGC accession number for Cdk7 U BC00529S. CDKNIB was 
spotted at an equivalent solution protein concentration of approximately 12 ng/pl. The median probing concentration for 
the 19 proteins wa.i 11 ng/ul; the mean concentration was 12 ng,Vl. 
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Development and Validation of Kinase Substrate 
Screening on ProtoArray'" High-Density Protein 
Microarrays 

B. Schweitzer, G. Michaud, R. Bangham, J. Bonin, M. Salcius, and P. Predki 

Abstract 

Identifying biologically relevant substrates for protein kinases is a critical step in understanding 
the function of these clinically important enzymes. Traditional approaches for kinase substrate 
identification are expensive, slow, and lack sensitivity. For this reason, many kinase activity assays 
employ generic substrates or peptides that decrease the reliability of these assays for drug 
development. We describe here the development and validation of a rapid and sensitive microar- 
ray-based kinase substrate identification technology, which enables parallel screening of kinases 
against thousands of potential native protein substrates. This paper describes the validation of 
this approach and use of the resulting data for pathway mapping. 



Introduction . 
Protein kinases play a central role in the regulation of multiple cellular processes and in diseases; 
in fact, 244 kinases have been mapped to disease loci (1). U is not surprising, therefore, that a large 
number of biotech and pharmaceutical companies are seeking to discover and bring to the clime com- 
pounds that demonstrate specific inhibition of kinases Involved in disease. Some examples of kinase 
inhibitors already in clinic include Gleevec* (Novartis), an Abl and c-Kit kinase inhibitor that has 
been successful in the treatment of chronic myeloid leukemia and gastrointestinal stromal tumors 
and Herceptin* (Cenentech). an antibody that targets the HER2/neu {erbB2} protein for treatment of 
breast cancer. The family of human protein kinases consists of more than SOO members of which only 
a fraction have been characterized to date. Much is still not known about the biological function of 
many kinases, the protein substrates that are phosphorylated by these kinases, or the roles of these 
kinases and substrates in disease. 

The importance of protein kinases in virtually all processes regulating cell transduction illustrates the 
potential for kinases and their cellular substrates as targets for therapeutics. Considerable efforts have 
been made to elucidate kinase biology by identifying the substrate specificity of kinases and using 
this information for the prediction of new substrates. Some of the approaches used to date include 
creation of a database from annotated phosphorylation sites, prediction of substrate sequence patterns 
from available structures of kinase/peptide substrate complexes, and screening of peptide hbraries 
and peptide arrays (2.3). More recent efforts include attempts to map the phosphoproteome using 
mass spectroscopy-based techniques. While these studies have provided some information about 
kinase biology, they have been severely limited by their complexity, expense, lack of sensitivity, the 
use of non-structured peptides, and by poor representation of potential substrates in the screens. 

Invitrogen is pioneering the use of arrays of whole or partial proteomes to improve the success rates 
of drug discovery. This report describes how ProtoArray™ technology rapidly converts gene sequences 
into arrays of functional proteins that can be used to reveal new disease pathways and define the 
specificity and selectivity of potential drugs. In addition, this paper discusses how the ProtoArray™ 
high-density protein microarray technology is an ideal format for identifying biologically relevant 
substrates for protein kinases in a rapid, cost-effective, and comprehensive fashion. 
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Validation results 

ProtoArray"* technology enables fast, simple* and comprehensive 
kinase substrate screening. 

Each ProtoArray™ microarray contains thousands of S. cerevlsiae or 
H. sapiens proteins spotted in high density on glass slides. These 
slides can be probed to identify protein interactions with DNA, 
proteins, lipids, sugan, small molecules, and enzymes. The first 
proof-of-principle experiment demonstrating that these arrays can 
be used to reveal substrates of proteins kinases was carried out on 
the Yeast ProtoArray™ microarray. which contains over 4000 unique 
yeast proteins spotted in duplicate. The experimental outline is 
simple (Figure lA). A solution comprising a kinase and radioactive 
ATP was incubated on a Yeast ProloArray™ microarray, and then the 
slide was washed and exposed to a phosphoimager (Figure IB), The 
experiment identified 41 proteins specifically phosphorylaied by the 
exogenous kinase. 

Figure 1. Kinase-substrate assay on the yeast ProtoArray™ 
Microarray. 



A. Glass Slide 

■ . -V 




Analyze Slide for RadloacUve 
Spots (shown In red on righO 



B. 



A) Experimental design of substrate screening assay. B) The yeast ProtoArray™ 
microarray containing > 4000 different yeast proteins probed with a purified 
kinase. Inset: positives boxed in green tautophosphorylation) and red (substrates). 

Initial work with Human ProtoArray'" microarrays demon- 
strates kinase substrate discovery value. To test our platform 
for identification of kinase substrates, we chose the human 
protein kinase Arg. This kinase, along with its closely related 
homolog Abl, is known to be involved in the etiology of chronic 
myeloid leukemia (CML) and is a target for the anti-cancer agent 
Gleevec*. Human ProtoArray™ microarrays were manufactured 
with 1500 different quality-controlled recombinant human pro- 
teins produced in Invitrogen's proprietary high-throughput insect 
cell expression and parallel purification systems. A known Abl/ 
Arg substrate, Crk, was printed in regular intervals on the array 
as a positive control. The Human ProtoArray™ microarray in 
Figure 2A was incubated with radiolabeled ATP alone; proteins 
that show a signal on this array are kinases present on the array 
that autophosphorylate. The array in Figure 2B was incubated 
.-Avith -ArgJn .the...pres€nce..of xadiolabeled. AXP. .Th,is.kinase..phos:... „ 
phorylated the control substrate Crk in every subarray; in addi- 
tion, nine other proteins, that did not give signal with ATP alone, 
were observed to be phosphorylaied in the presence of Arg. We 
also looked at the effect of adding an Arg/ Abl kinase-specific 
inhibitor and found that the inhibitor specifically decreased 
phosphorylation of Crk and the nine other microarray identified 
substrates (Figure 2C), confirming that these proteins were phos- 
phorylaied by Arg kinase. 



Figure 2. Identification of substrates for Arg kinase. 




ProtoAnBy"* microarrays containing 1500 cftfferenl human proteins were treated 
with ATP (A), ATP and Arg (B), or ATP, Arg, and Arg-specific Inhibitor (O. Nine sub- 
strates were Identified for Arg (boxed in red). 

Verification of specific phosphorylation by a human kinase. 
Arg kinase is known to specifically phosphorylate tyrosine resi- 
dues on certain proteins. To verify that Arg kinase maintains 
this specificity for tyrosine residues in array-based experiments. 
Human ProtoArray" microarrays were treated sequentially with 
Arg kinase followed by a phosphotyrosine phosphatase. As 
shown in Figure 5, all proteins phosphorylated by Arg kinase 
on the array are dephosphorylated by the phosphotyrosine 
phosphatase, confirming that Arg kinase substrates on the array 
are appropriately phosphorylated on tyrosine residues. Signals 
from proteins that autophosphorylate (i.e., that show signal in the 
absence of exogenous kinase) were not affected by phosphotyro- 
sine phosphatase treatment, indicating that these were kinases 
that autophosphorylate serine/threonine residues. 



Figure 3. Phosphotyrosine phosphatase reduces Arg substrate 
phosphorylation, 

33p^ATP + Arg 33py.ATP -t; Arg + TyrPhosphatase 




A Human ProtoArray* microarray containing the eight identified Arg subsUaies 
was probed with Arg kinase and then subsequently with a phosphotyrosine 
phosphatase. 
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Substrate phosphorylation is kinase-specific. The results with 
Arg kinase on Human ProloArray^ microarrays dearly demon- 
strated that this kinase is highly selective in the protein sub- 
strates that it phosphorylates. In order for this application of the 
ProtoArray™ technology to be useful to a wide range of kinase 
biologists, the ability to distinguish phosphorylation patterns of 
different kinases must be established. Consequently, ProtoArray'" 
microarrays primed with 2500 different human proteins were 
incubated with 33p-ATP and either Arg or PKC kinase (Figure 4] 
or with 3^P-ATP alone (not shown). As shown in Figure 4, phos- 
phorylation signals specific to each kinase were clearly observed. 
The majority of signals present in both experiments were due to 
autophosphorylation by some of the 'v400 kinases printed on the 
array. Analysis of the whole array revealed dozens of proteins 
that were specific to one of the kinases. We have now charac- 
terized the phosphorylation patterns of over a dozen different 
human kinases and have identified large numbers of unique sub- 
strates for each kinase. 

Figure 4. Specificity of kinase phosphorylation on Human 
ProtoArray^ microarrays. 





Two Human ProtoArray"- microarrays were incubated with 33p.ATP and either 
Arg tlefl column) or PKC kinase <right column). Two representative subarrays are 
shown: In subarray 1 (lop row>, two proteins phosphorylated sped Realty by Arg 
kinase are boxed in blue; in subarray 2 (bottom row), a protein phosphorylated 
speciftcaliy by PKC is boxed in red. 

Validation of substrate identification in an independent assay. 
Biochemical validation of the array-based substrate screening 
assay was initially carried out by determining whether proteins 
phosphorylated by Arg kinase on the array would also be phos- 
phorylated in a different assay format. Figure 5 shows the results 
of assays in which two of the substrate proteins were incubated 
in solution with Arg kinase in the presence of radiolabeled ATP. 
Separation of the reaction mixtures on denaturing gels demon- 
strated that proteins at the expected molecular weight of the 
substrate proteins were indeed phosphorylated in solution. These 
results strongly suggest that these proteins maintain their native 
conformation on the array, allowing them to be phosphorylated 
by specific kinases. 



Detailed validation studies reveal the highest affinity sub- 
strate for a pharmacologically relevant kinase reported to date. 
Although phosphorylation of proteins by kinases in experiments, 
such as the one shown in Figure 5, is a prerequisite for identify- 
ing substrates for these enzymes, additional lines of evidence are 
needed to demonstrate physiological relevance. One such line of 
evidence is data showing that the substrate is phosphorylated at 
concentrations likely to occur in a cell. One of the eight proteins 
identified on the ProtoArray™ microarray as a substrate for Arg 
kinases was selected for more detailed Km measurements based 
on the protein's known role in cell division. Analysis of the data 
from this experiment yields a for the substrate of approxi- 
mately 50 nM (Figure 6). Not only is this value well within a 
potential intracellular concentration for a protein, but it is also 
lower than any value previously reported for Arg kinase. 



Figure 5. Arg kinase phosphorylation of substrate in solution. 
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Arg kinase alone or mixed with substrate proteins was incubated at 30»C for 
30 minutes and then run on an SOS-PAGE gel and phosphoimaged. 

Figure 6. determination for an Arg kinase substrate Identified 
on the ProtoArray™ microarray. 
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AJg: Kinase t^tetralfi m 
Arg kinase was incubated with different concentrations of the substrate protein^ 
and the phosphorytaiion of the protein was measured in a gel-based assay similar 
to the one shown in Figure 4. 
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ProtoArray"* data is used to generate a new kinase pathway. In 
addition to biochemical validation, it is also desirable to see con- 
cordance of ProtoArray"* results with published data. In fact, a 
search of the literature and publicly available databases revealed 
that one of the proteins proven to be a substrate for Arg Kinase 
on a Human ProtoArray^ microarray, Shpl, had indeed been 
annotated as a substrate for this kinase. Using a protein-protein 
interaction assay on a Human ProtoArray'" microarray, we also 
demonstrated for the first time that Arg kinase forms a stable 
interaction with Shpl (data not shown). Shpl is a phosphotyro- 
sine phosphatase localized at the plasma membrane; our data, as 
well as the published data, are therefore consistent with co-local- 
ization and co-regulation of Shpl phosphatase and Arg kinase 
(Figure 7). Other published reports indicate that following acti- 
vation by Src, Arg and Abl kinases translocate into the nucleus, 
although the functional consequences of this translocation have 
not been clarified. ProtoArray^ results, however, clearly showed 
that these kinases phosphorylated several transcription factors 
that may have roles in cell cycle function. An RNA polymerase 
was also phosphorylated, providing another line of evidence 
that these kinases regulate RNA transcription and gene expres- 
sion. Equally intriguing is the finding that a membrane-associ- 
ated receptor present on the array was phosphorylated by Arg 
kinase. Interaction of this receptor with a membrane-associated 
kinase has been shown by others to result in the activation of 
two kinases that have been implicated in oncogenesis. This find- 
ing represents a new and potentially therapeutically relevant link 
between the Arg/Abl kinases and cancer. 

Figure 7. Pathway mapping with Arg kinase-substrate ProtoArray'^ 
data. 
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Conclusion 

We have combined unprecedented protein content with a simple- 
to-use microarray assay to generate new knowledge about protein 
kinases with unequalled efficiency. We have demonstrated spe- 
cific phosphorylation of both known and novel substrates using 
Human and Yeast ProioArray™ high-density protein microar- 
rays and have validated these proteins as substrates using more 
standard assays. Combining this new type of information with 
Inviirogen's other capabilities for measuring phosphatase activity, 
protein-protein interactions, and drug Inhibition on microarrays 
allows scientists to link kinases to intracellular signaling net- 
works and generate new understandings about kinases and their 
substrates as drug targets with unmatched speed and efficiency. 

Implications of ProtoArray'" technology 
The discovery of new kinase substrates by Invitrogen and its 
collaborators using the ProtoArray™ technology platform demon- 
strates the enormous value of high-content protein arrays. This 
was clearly illustrated in experiments using Arg kinase: nine sub- 
strates were identified using an array printed with 1500 human 
proteins, but six more were found using a 2500 protein array. 
Extrapolating to an array containing a representative protein 
from the approximately 30,000 human genes (the UniProteome) 
suggests that over 150 substrates would be identified, thereby 
greatly increasing the informational value of the experiment. 
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The eufcaiyotic protein kinase superfan^ lillf se 
(catalytic) domain structure §^ 
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♦Department of Cell Biology, VanderbUt University School of Medicine, NashvdHe. T^ and 
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The eukaryotic protein kinases comprise one of the 
largest superfamilies of homologous proteins and 
genes. Within this family, there are now hundreds of 
different members whose sequences are known. Al* 
though there is a rich diversity of structures, regulation 
modes, and substrate specificities among the protein 
kinases, there are also common structural features. 
These conserved structural motifs provide dear indica- 
tions as to how these enzymes manage to transfer the 
l^phosphate of a purine nucleotide triphosphate to the 
hydroxy! groups of their protein substrates. The 
authors of this review have carried out a monumental 
task of analyzing and collating the amino add se- 

auences of all reported protein kinases and defining 
le conserved structural features that characterize the 
portion of these proteins that is responsible for their 
catalydc activity. Comparison of the sequences in the 
catalydc fragment of the protein kinases has been used 
to arrange these enzymes in evolutionary trees that 
group subfamilies of dosely related enzymes. It is com- 
forting that the structural reladonships that emerge 
from these trees result in groupings that also reflect 
related functions. The work presented in this review 
seems to be an excellent example of the type of analy- 
sis that will become indispensable in the coming years, 
as more and more sequence information become avail- 
able to biologists as a result of the genome projects. 



ABSTRACT The eukaiyotic protein kinases make up a 
large superfomily of homologous proteins. They are re* 
lated by virtue of their kinase domains (also known as 
catalytic domains), >^ch consist of »250-300 amino add 
residues. The kinase domains that define this group of 
enzymes contain 12 conserved subdomains that fold mto 
a common catalytic core structure, as revealed by the 
S-dimensional structures of several protein-serine ki- 
nases. There are two main subdivisions within the super- 
SEunily: the protein-serine/threonine kinases and the 
protein-tyrosine kinases. A classification scheme can be 
founded on a kinase domain ph^ogeny, which reveals 
families of enzymes that have related substrate spedfid- 
ties and modes of regulation.— Hanks, S. K., Hunter, T. 
Tlie eukaiyotic protein kinase superfamily: kinase (cata- 
lytic) domain structure and classification. FASEB J. 9, 
576-596(1995) 

Key Words: protein- tyrosine kinase • protein-serine ki- 
nose • protein phosphorylation • AMP-deperident protein kinase 

THE EUKARYOTIC PROTEIN KINASE SUPERFAMILY 

One of the buvest known protein superfamilies is made 
up of protein kinases identified largely from eukaryotic 



soiuxes. (The term superfEunily will be used here to dis^ 
tinguish this broad collection of enzymes from smaller, 
more dosely related subsets that have been commonly 
referred to as £aunilies). These enzymes use the y-phos^ 
phate of ATP (or OTP) to generate phosphate 
monoesters using protein alcohol groups (on Ser and 
Thr) ancV^or protein phenolic groups (on Tyr) as phos- 
phate acceptors. The protein kinases are related by virtue 
of their homologous kinase domains (also known as cata- 
lytic domains), which consist of "250-300 amino add 
residues (reviewed in refs 1-3; and see below). During the 
past 15 years, previously imrecognized members of the 
eukaryotic protein kinase super&jtnily have been uncov- 
ered at an exponentially increasing rate and currentiy 
appear in the literature almost weekly. This pace of dis- 
covery can be attributed to the past development of mo- 
lecular doning and sequencing technologies and, more 
recentiy, to the advent of the polymerase chain reaction 
(PCR),^ which £acilitated the use of homology-based don- 
ing strategies. Consequentfy, about 200 different superfa- 
mily members (products of distinct paralogous genes) 
had been recognized from mammalian sources alonel 
The prediction made several years ago (4) that the mam- 
malian genome contains about 1000 protein kinase genes 
(roughly 1% of all genes) would still appear to be within 
reason, and may even be an underestimate (5). 

In addition to mammals and other vertebrates, eu- 
karyotic protein kinase super&mily members have been 
identified and characterized from a wide range of other 
animal phyla as well as firom plants, fimgi, and protozo- 
ans. Hence^ the protein kinase progenitor gene can be 
traced back to a time before the evolutionary separation 
of the major eukaryotic kingdoms. The identification of 
eukaryotic-like protein kinase genes in prokaryotes (6, 7) 
raises the possibility that the protein kinase progenitor 
gene might have arisen before the divergence of 
prokaryotes and eukaryotes (see below). Studies of the 
budding and fission yeasts, Saccharomyces cereuisiae and 
Schizosaccharomyces pombe, have been (wuticularly fruitful 
in the recogniuon of new protein kinases. In these geneti- 



'This artide is based on an introductory chapter in the Protein 
Kinase FactsbooK edited by D. G. Hardie and S. K. Hanks, publish- 
ed in 1995 by Academic Press, London. 

*ro whom correspondence and reprint requests should be 
addressed, at: Molecular Biology and Virology Laboratory, The 
Salk Institute, 10010 N. Torrey Pines Rd, La Jolla, CA 92037, 
USA. 

'Abbreviations: PGR. polymerase chain reaction; PKA-Co, 
type a cAMP-dcpendcnt protein kinase catalytic subimit; Cdk2, 
cyclin-dependent kinase 2; Erk2, p42 MAP kinase; APE, 
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cally tractable organisms, the powerM approach of mu- 
tant isolation and cloning by complementation has netted 
dozens of protein kinase genes required for numerous 
aspects of ceD fancdon (8). In many cases, vertebrate 
counterparts have now been found for these genes, lead- 
ing to a growing awareness that protein phosphor^iadon 
pathways that regulate basic aspects of ceU physiology 
have been maintained throughout the course of eu- 
karyotic evoludon. 

Even though the overwhelming majority of protein ki- 
nases identified from eukaryodc sources belong to this 
superfiunily, a small but growing number of such enzymes 
do not qtudify as superramily members. Most of these are 
related to the prokaryotic protein-hisddine kinase family 
(see below), which forms the sensor components of two- 
component signal transduction systems (9). Included in 
this category are a putative ethylene receptor encoded by 
the flowering plant ETRl gene (10), the product of the 
budding yeast SZJVi gene (11, 12) thought to be involved 
in relaying nutrient mformation to elements controlling 
cell growth and division, the mitochondrial 
branched-chain a-ketoadd dehydrogenase kinase (IS), 
and the mitochondrial pyruvate dehydrogenase kinase 
(14). In prokaryotes, protem-hisddine kinases phosphory- 
late aspartates in theu* target proteins, but except tor the 
two dehydrogenase kinases that phosphorylate serine, the 
acceptor spedfiddes of most of the euluuyodc protein 
kinases of this type are not known. In addition to these 
protein kinases, the Bcr protein encoded by the breakpoint 
duster region gene involved in the Philadelphia chromo- 
some translocation (15) and the A6 kinase isolated by 
e3q>ression doning using an anti-phosphotyrosine anti- 
body (16) have kinase domains unrelated to any known 
eukaryodc or prokaryotic kinase* In addition, true pro- 
tein-histidine kinases are known in eukaryotes. One such 
enzyme has been extensively charaaerized from budding 
yeast but not yet molecularly doned (17), and so it is not 
dear Aether thb enzyme wiU belong to the protein ki- 
nase superEamily or use a novel structural prmdple for 
phosphotransfer. 

What about the prokaryotes? It has been known for 
years that protein phosphorylation events play key ro- 
tatory roles in numerous baaerial cell processes indudmg 
chemotaxis, bacteriophage infection, nutrient uptake, 
and gene transcription (reviewed in refs 18, 19). The 
bacterial protein kinases have been divided into three 
general classes (20): 1) protein*histidine kinases such as 
mose functioning in two-component sensory regulatory 
systems (stricdy speaking, these are protein-aspartyl ki- 
nases, because autophosphorylation on His is an interme- 
diary step in phosphotransfer to an aspartate in the 
response-regulator protein) (9); 2) phosphotransferases 
such as those of the phosphoenol pynivate-dependent 
phosphotransferase system mvolved m sugar uptake (21); 
and 3) protein-serine kinases such as isodtrate dehydro- 
ffenase kinase/phosphatase (22). Amino add sequences 
have been determined for members of each class, and all 
are unrelated to the eukaryodc protein kinase superfa- 
mily. 

Recendy, however, true homologs of the eukaryodc 
protein kinases have been identified &om two spedes of 
bacteria. Yersinia pseudotuberculosis (7) and Myxococcus xan- 
thus (6, 23). Are these special cases, or the mrst examples 
of many such genes in prokaryotes? The eukaryotic-like 
protein kinase YpkA from the pathogenic enterobacteria 
K pseudotuberculosis b encoded by a plasmid essential for 



the virulence of this infectious organism. In addition to 
YpkA, at least two other proteins encoded by ^nes resid- 
ing on the virulence plasmid exhibit high suxularity to 
eiSuuyodc proteins. Inus, it seems likdy that the viru- 
lence plasmid genes were transduced firom a eukaryodc 
host by horizontal transfer. The myxobacterium A£ xan- 
thus presents a different and peiiiaps more intriguing 
picture. Application of the PGR homology-based doning 
strategy revealed that at least ei^t genes encoding mem- 
bers of the eukaryodc protein kmase super&mily are pre- 
sent in the genome of this spedes (23). The myxobacteria 
are unusual prokaryotes in that they undei^ a complex 
devdopmental cyde upon nutrient depletion, much like 
that of die eukaryotic slime mold Dictyostelium. Given that 
protein kinases are commonly involved in regulating 
growdi and differentiation of eukaryotic cells, it is attrac- 
tive to speculate that the eukaryotic-like protein kinases 
in Af. xanthus are specifically involved in regulating their 
developmental cyde. Indeed, one of these kinases, Pknl, 
was shown to be required for proper fruiting body forma- 
tion. The same could be true for the eukaryotic-like pro- 
tein kinase PknA from Anabena (24). In keeping with this 
idea, neither the PGR approach applied to Escherichia coU 
(23) nor extensive sequencing of the E, coli genome (now 
30% complete) has yielded eukaiyotic-like protein ki- 
nases. Hence, genes encoding memoers of the eukaryotic 
protein kinase superfamily may be present only in bacte- 
ria that can undergo a developmental cyde. However, 
unpublished reports of eukaryotic-like protein kinases in 
Streptomyces coelicolor, and in three spedes of Methanococ- 
cus, suggest that such genes are more widely expressed 
among prokaryotes, and potentially these genes rqiresent 
the ancestors for the entire eukaryotic protein kinase su- 
perfamUy. 



THE HOMOLOGOUS KINASE DOMAINS 

The kinase domains of eukaryotic protein kinases imrart 
the catalytic activity. Three separate roles can be ascribed 
to the kinase domains: 1) binding and orientation of the 
ATP (or GTP) phosphate donor as a complex with diva- 
lent cation (usually Mg^ or Mn**); 2) binding and orien- 
tation of the protein (or peptide) substrate; and 3) 
transfer of the y-phosphate from ATP (or GTP) to the 
acceptor hydroxy! residue (Ser, Thr, or Tyr) of the pro- 
tein substrate. 

Conserved features of primary structure 

The total number of distinct kinase domain amino add 
sequences available is now approaching 400 (Table 1). 
Included in this total are the vertebrate enzymes encoded 
by distinct paralogous genes, their presimied functional 
homologs from invertebrates and simpler organisms (en- 
coded by orthologous genes), and those identified finom 
lower organisms and plants for which vertebrate ecjuiva- 
lents have not been found. Conserved features of kinase 
domain primary structure have previously been identified 
throu^ an inspection of multiple amino add sequence 
alignments (1-3) . The large number of sequences now 
available predudes showing an alk^nment containing all 
knovm kinase domains. Thus, in F^. 1 only 60 different 
kinase domain sequences are aligned. These are drawn, 
however, from the widest possible sampling of the super- 
family and thus provide a good representation of the 
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AhCG Croup 

AGC-I. Cyclic nudeotide-regulated protein kinase bmily 

A. Cyclic AMP-dependent protein kinase (PKA) sub&xnily 
verUbrate: 

1. PKA-Cou PKA catalytic subunit, alpha/onn 

2. PKA-CP: PKA catalytic subunit, bcta^onn 

3. PKA-Cy: PKA catalytic subunit, gammshfonn 
Drosophila melanogaster 

1. DtaFKA-CO: PKA catalytic subunit, CO fonn 

2. DxnPKA-Cl: PKA catalytic subunit, CI form 

3. DmPKA-C2: PKA catalytic subunit. C2 fonn 
Caenorhabditis eUgtms: 

1 . CePKA: PKA catalytic subunit homolog 
SocchAfoiiiycss ctTgvisuit! 

1. ScPKA-Tpkl: PKA catalytic subunit homolog. type 1 
SchixcsaaJuiToaycei p<>mb€: 

1. SpPKAl : PKA catalytic subunit homolog 
Dietyost^ium discoideum: 

1. DdPKA: PKA catalytic subunit 
Aptfsia califomica: 

1 . AplC PKA catalytic subunit homolog 

2. Sak: "SpcimatozooiMissociated kinase* 

B. Cyclic GMP-dcpendent protein kinase (PKG) subfamily 
vertebmU: 

1. PKG-I: PKG, type I 

* 2.PKG-II: PKG, type II 
Drosophiia vulanogaster: 

1. DmPKG<;i: PKG homolog. type 1 

2. DmPKG02: PKG homolog. type 2 

C Others 

DktyosteUum discoideum: 

l.DdPKl: PKA homolog 

AGC-II. Diacylglycerol-activate^/phosphoUpiddependent protein kinase C (PKC) &mily 

A. "Conventional" (Ca** -dependent) protein kinase C (cPKC) subfemily 
vertehaU: 

1. cPKCo: Protein Kinase C, alpha-form 

2. cPKCP: Protein Kinase C. beta^brm 

3. cPKCy: Protein Kinase C. gamma-form 
DrosophUa vulanogjoster 

1. DmPKC-53Ebr PKC homolog expressed in brain, locus 5S£ 

2. DmPKC-53Eey: PKC homolog expressed in eye. locus 53E 
Aplasia califomica: 

1. ApM: PKC homolog, type I 

B. "Novel" (Ca**-indcpcndent) Protein Kinase C (nPKC) subfamily 
vertebrate: 

1. nPKC5: Protein Kinase C, delta-form 

2. nPKCe: Protein Kinase C, cpsilon-form 

3. nPKCn: Protein Kinase C, eta-form 

4. nPKC8: Protein Kinase C, theta-form 
Drosophila melanogaster 

1. DmPKC-98F: PKC homolog, locus 98F 
Apbfsia califomica: 

1 . Apl-II: PKC homolog. type n 
CaenorHab<Utis eUgans: 

1. CePKC: PKC homolog. product of tpa^l gene 

* 2. CePKC IB: PKC homolog aq>ressed in neurons and intemeurons 
Dictyostelium discoideum: 

* LDdMHCK: PKC homolog 
Saccharomyces cerevisiae: 

1 . ScPKAl : PKC homolog, product of PKCl gene 

* 2. ScPKA2: PKC homolog. product oiPKCZ gene 
Sckixtisacchafomyces pombe: 

1. Pckl: "Pombe Okinase". type 1 

2. Pck2: "Pombe Okinase", type 2 

C. "Atypical" Protein Kinase C (aPKC) subfamily 
vertebrate: 

1. aPKCC: Protein Kinase C, zeta-form 

* 2. aPKCi: Protein Kinase C iota-form 

* 4. aPKQi: Protein Kinase C, mu-form 

"More information about the individual protein kinases listed (including sequence references) can be obtained by contacting the authors or by 
consulting The Protein Kinase Factsbook (42). Protein kinases marked with asterisks (*) were not included in the phylogenetic analysis due to their 

recent discovery. In many instances new protein kinases were cloned by more than one group; in these cases the most commonly accepted name b 
used for the entry and alternative names are listed in parentheses after the entry. Protein kinase homologs from DNA viruses are not included in 
this dassificadon. 
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Table 1. (cotUinued). 



D. Others 
tmtdmU: 

* l.PKN: Protein Unase with PRC^vlatedcatalytkdom^ 

AGC-ni. Related to PKA and PRC (RAC) bmUy 
vgrtebrate: 

1. RAOa RAQ alpha^orm; ceOular honK^ of v-Akt oncoprotein 

2. RAC§: RAG, beta^orm 
Dnsophiia: 

1. DmRAC: RAC homolog 

Caenorhabdiiis eUgans: 

* 1. CeRAC RAC homolog 

ACG-IV. Family of Unasese that phosphorylate G protein-coupled recepton 
vtrUbmU: 

1. PARKl: ^adrenergic receptor kinase, type 1 

2. PARK2: ^Adrenergic receptor kinase, type 2 

3. RhK: Rhodopsin kinase 

* 4.m 1: Gi)rotcin<oupled receptor kinase homolog 

* 5.GRK5: G-protein<oupled receptor kinase, type 5 

* 6. GRK6: G^rotein<oupled receptor kinase, type 6 
DrosophUa wieianogast^ 

1. DmGPRRl: Drosophila G-protein<oupled receptor kinase, type 1 

2. DmCPRK2: Drosophib Ci>rotein<oupled receptor kinase, type 2 

AGC-V. Family of budding yeast AGOrdated kinases 
Saccharmyces ctnvisias: 

1. Sch9: Suppressor of defects in cAMP effector pathway 

2. Ykr2: AGCrelated kinase 
S.Ypkl:* AGCrelated kinase 

AGOVI. Family of kinases that phosphorylate ribosomal S6 protein 

1.S6K: 70 kDaS6 kinase with sin^e catalytic domain 

2. RSKl(Nt): 90 kDA S6 kinase, type 1 

3. RSK2(Nt): 90 kDA S6 kinase, type 2 

[Note: The RSK cnrymes have two distina catalytic domains. The Nt-domain is closely related to S6R. whereas the 
Ct-domain is most closely related to phosphorytase kinase] 

AGOVn. Budding yeast DbeV20 Family 
Saccharomyces cerevisiae: 

1 . Db£2: Product of gene periodically expressed in cell cycle 

2. Dbf20: Close relative of DBF2 not under cell cyde control 

AG-Vni. Flowering plant 'PVPKl Family" of protein kinase homologs 
Phylum Angiospenaaphyta (Kingdom PkmUu): 

1. PvKl: Bean protein kinase homolog 

2. OsGl 1 A: Rice protein kinase homolog 

3. ZmPPK: Maize protein kinase homolog 

4. AtPK5: Arabidopsis protein kinase homolog 

5. AtPK7: Arabidopsis protein kinase homolog 

6. AtPK64: Arabidopsu protein kinase homolog 

7. PsPK5: Pea protein kinase homolog 

Other AGC-related kinases 
vtfUbmU: 

1. DMPK: "Myotonic Dystrophy Protein Kinase" 

2. Sgk: "Serum and glucocortocoid regulated kinase' 

* 3. Mast205: Spermatid "Microtubuleassodated serine/threonine kinase' 
Neurospora crassa: 

1 . NcCot-1: Product of gene required for normal colonial growth 

Dictyostdium discoideum: 

1 . Ddk2: Produa of dcvelopmentally-rcgulatcd gene 

SocchoTomycts ctttvisuui 

1. ScSpkl: DuaUpcdfldty kinase 

Phylum Angiospermophyta (Kingdom Haniae): 

* 1. Atpkl: Arabidopsis protein kinase 

CaMK Group 

CaMK-I. Family of Unases regulated by CaVCalmodulin, and close relatives 
A Sublamily including "Multifimctionar CaVCabnodulin Kinases (CaMKs) 
Vifttbrat^ 

1. CaMKl: CaMK,typeI 

2. CaMK2a: CaMK, type n, alpha subunit 

3. CaMK2^ CaMK, type D, beta subunit 

4. CaMK2Y: CaMK, type 11, gamma subunit 

5. CaMK2d: CaMK, type n, delta subunit 

* 6. EF2K: Elongation Factor-2 Kinase or CaMK type lU 
7. CaMK4: CaMK type IV 
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Drosophila metanogaster: 

1. DinCaMK2; CaMK-U homolog 

Sacchanmtyces arevisiae: 

1. ScCaMK2-l: CaNfK-n homolog, product of CMKJ gene 

2. ScCaMK2-2: CaMK-U homolog. product of CMK2 gene 
Aspergillus niduUms: 

1. AnCaMK2: CaMK H homolog 

B. Subfamily including phosphorylase kinases 
vfrtelnate: 

1. PhK-yM: Skeletal muscle phosphorylase kinase catalytic subunit 

2. PhK-yr: Male germ cell phosphor^ase kinase catalytic subunit 

3. RSKl(Ci): 90 kDa S6 kinase, type 1; Oterminal catalytic domain 

4. RSK2(Ct): 90 kDa $6 kinase, type 2; C-terminal catalydc domain 

C. Subfamily including myosin light chain kinases 
vertebtate: 

1. skMLCK: Skeletal muscle MLCK (rabbit) 

2. smMLCK: Smooth musde MLCK (rabbit) 

3. Titin: Huge protein implicated in skeletal muscle development 
Camorftabditis eUgans: 

1. Twn: Twitchin" protein involved in muscle contracdon or development 

Dictyostelium discouUum: 

I. DdMLCK: Slime mold myosin light chain kinase 

D. Subfamily of plant kinases with intrinsic calmodulirt4ike domain 
Phylum Angiospermophyta (Kingdom Plantae): 

1 . CDPK: Soybean Ca^-regulated kinase with intrinsic CaM-like domain 

2. AtAKl: Arabidopsis CDPK homolog 

* 3. OsSpk: Rice CDPK homolog 

* 4. DcPk4Sl: Carrot CDPK homolog 
£. Subfamily of pbmt kinases with highly acidic domain 

Phylum Angiospermophyta (Kingdom Pianlag): 

* 1. ASKl: Arabidopsu protein kinase homolog with highly acidic idomain 

* 2. ASK2: Arabidopsis protein kinase homolog with highly addic domain 
F. Other CaMK-related kinases 

vertebruie: 

1 . PskH 1 : Putative protein-serine kinase 

* 2. MAPKAP2: "MAP Kinasc^Activated Protein Kinase 2" 
Sacchofomyces cerevisiae: 



\. Mrc4 

♦ 2. Dunl 

♦ 3. Rckl 

♦ 4. Rck2; 



Protein required for meiotic recombination 
Protein required for DNA damage^ndudble gene expression 
"Radiation sensitivity complementing kiruue. type 1" 
"Radiation sensitivity complementing kiiuue, type 2* 



CaMK-Il. Snn/AMPK family 
vertebraU: 

* 1:AMPK: "AMP-Activated Protein Kinase" 

2: p78: Protein lost in cardnomas of human pancreas 
Saccharomyces cerevisiae: 

1. Snfl: Kinase essential for release from glucose repression 

2. Kinl: Protein kinase with N-terminal catalytic domain 

3. Kin2: Qose relative of KINl 

4. Ycl24: Protein kinase homolog on chromosome III 

* 5. Yd453: Protein kinase homolog on chromosome XI 
Schizosauharomyces pombe: 

1 . SpKin 1 : Product of gene important for growth polarity 

2. Niml: Inducer of mitosis 
Phylum Angiospermophyta (Kingdom Plantae^' 

1 . PSnfl-RKlN 1 : Rye putative protein kinase that complemenu yeast snfl polarity 

2. PSnfl-AKINlD: Arabidopsis putative protein kiiuse related to SNFl 

3. PSnn-BKIN12: Barley protein related to SNFl 

* 4. PKABAl: Wheat kinase induced by absdsic add 

* 5. WPK4: Wheat kinase homolog regulated by light and nutrients 

* 6. NPK5: Tobacco Snfl homolog, activates SUC2 gene expression 

Other CaMK Group Kinases 

Plasmodium falciparum (malarial parasite): 

1. PfCPK: Ca^-regulated kinase with intrinsic CaM-like domain 

2. PfPK2: Putative protein kinase 

CM<jrC Group 

CMGOl. Family of cydin-dependent kinases (CDKs) and other dose relatives 
vertehrate: 



1. Cdc2: 

2. Cdk2; 

3. Cdk3: 

4. Cdk4: 

5. Cdk5: 



Inducer of mitosis; functioiud homolog of yeast cdc2VCDC28 kinases (Cdkl) 
Type 2 cycUn-dependent kinase 
Ty[>e 3 cydin^ependent kinase 
Type 4 cydin-dependent kinase 
Type 5 cydin-dependent kinase 
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6. Cdk6: 

7. PCTAIREl: 

8. PCTAIRE2: 

9. PCTAIRE3: 

10. Mol5: 
Drosoptula mgianogasUr: 

1. DmCdc2: 

2. DmCdc2c 
DidyosUliuM diMccidnm: 

1. DdCdc2: 

2. DdPRK: 
AsptTpUus mdiulans: 

l,NIMXcdc2: 
PUtmodiumJakiparum: 

l.PfPK5: 
Entamoeba hi^otftiea: 

l.£hC2R: 
CrithidiafasdeulaSa: 

l.CfCdc2R: 
Leishwutnia mtxUana: 

* 1. UnCRRl: 
Sacchanmyca cmuisiag: 

1. Cdc28: 

2. Pho85: 

3. Kin28: 
Schizosaccharomyca ptmbe: 

l.SpCdc2: 
HistoplasMa capsuiahm: 

• 1. HcCdc2: 



Type 6 cydiihdependent kinaie 
Cdc2-related protein 
Cdc2-related protein 
Cdc2-related protein 

*Cdk-activating kinase"; Negative regulator of meiosit (CAK) 

Functional homolog of yeast cdc2VCDC28 kinases 
CdcS-cognate protein; Cdk2 homolog 

Functional homolog of yeast cdc2VCDC28 kinases 
"CdcSrelated PCTAIRE Kinase' 

Cdc2-related gene produa 

Ccic2-related protein firom human malarial parasite 

Cdc2-related protein 

Cdc2-related protein 

"Cdc2Related Kinase' 

"CeIMivision<ycle" gene product 

Negative regulator of the PHO system and cell cycle regulator 
CDC28<elated protein 



*Celkiivision-cyde" gene produa 



Phylum Anffospemophfta (Kxttgdom Pianta*): 



Cdc2 homolog from dimorphic fungus 



1. PcdcS: 

• 2. MsCdc2B: 
3 OsC2R: 

CMGOU. Erk(MAP kinase) fomily 

1. Erkl: 

2. £rk2: 

3. Erk3: 

4. p63MAPR: 

5. SAPK^ 

6. SAPK-P: 

7. SAPK-Vjnkl: 

8. p38: 
DrosophUa mdanogasUr 

l.DmEritA: 
Camorhabditis ^tgans: 

♦ 1. Surl: 
Saahanmycts cgrrevisia^- 

1. Kssl: 

2. Fiis3: 

3. Slt2: 
« 4. Hogl: 
SchkosacchaTomyca pombe: 

1. $pkl: 

Phylum DeiUeromycota (Ktrtgdom Fiingi): 
l.CaErkl: 



Flowering plant Cdc2 homolog othat complemenu yeast mutanu 
Alfalfa Cdc2 cognate gene pn^uos that complements Gl/S transition 
More distantly related Cdc2 homolog ftom rice 



"Extracellular signal^egulated kinase", type 1 (p44 MAP kinase) 
•ExtraceUular signal-r^ulated kinase", type 2 (p42 MAP kinase) 
Somewhat distant relative of the Erk/MAP kinases 
Another more distant relative of the EiVMAP kinases 
"Stress-activated protein kinase, type alpha" (INK2) 
"Stress^activated protein kinase, type beta" 

"StresMurtivated protein kinase, type gamma" or "Jun N-terminal Kinase" 
HOGl-related protein (MPK2) 

Homolog of Erk/MAP kinases; product of roiied gene 

Erk/MAP kinase homolog 

Suppressor of s$t2 mutant, overcomes growth arrest 
Product of gene required for growth and mating 
Product of gene complementing lyt2 mutants (MPKl) 
Product of gene required for osmoregulation 

Produa of gene that confers drug resistance to staurosporine, a PK inhibitor 



Protein that interferes with mating fector-induced cell cycle arrest 
Trypanosoma bnu$i (Phyium Zoomastigina, Kingdom Protoctista): 
• l.KFRl: "KSSl- and FUS3-related" gene produa 

Phykan Angiospermophyta (Kingdom PlanUu): 

1. PErk: Flowering plant ErVMAP kinase homologs (7 distina homologs identified in Arabidopsis) 

CMGC-IU. Glycogen synthase kinase 3 (GSK3) family 
vertebmU: 



1. GSK3a' 

2. CSK3p: 
DrosophUa wulanogasUr 

l.Sgg: 
Sacdiaromycts cemHsiae: 
h Mckl: 

• 2.ScG$K3 

• 3. Mdsl: 
DiOyosUUum discoideum: 

• 1. DdGSKS: 



Glycogen synthase kinase 3, a-form 
Glycogen synthase kinase 3, p-form 

Produa of shaggy/t£Ste<ohiU S gene 

"Meiosis and centromere regulatory kinase" 
Protein closely related to MCKl 
Dosage suppressor of mckl mutant 



Pkyhm, Angiospermophyta (Kingdom PtmUu): 



Glycogen synthase kinase 3 homolog 



1. ASK-oc 

2. ASK^ 



Arabidopsis shaggy-related protein kinase", type alpha 
"Arabidopsu shaggynrelated protein kinase", type gamma 



EUKARYOTIC PROTEIN KINASE SUPERFAMILY 



581 



SERIAL REVIEW 



Table 1. (tanlinued). 



vertebrate: 




1. CK2a: 


Casein kinase U, alpha subunit 


1. CK2a': 


Casein kinase n, alph»prime subunit 


Drosophila wtelanogaster 


l.I>roCK2: 


Casein kinase n horoolog 


Caenorhabditis eUgans: 


l.CeCK2: 


Casein kinase n homolog 


TheiUria parva (a fimtoxaan parasite): 




l.TpCK2: 


Casein kinase U a^ubunic homolog 


Dktyosteiium discoideum: 


l.I>dCK2: 


Casein kinase H. a-mbuni( 


Sauhawmyces cerevisiae: 




1. ScCK2a: 


Casein kinase 11, alpha subunit 


2. ScCKSo*: 


Casein kinase n, alpha^rime subunit 


Schixosaccharomyces pombe: 


• l.SpOtal: 


Casein kinase 11, CMubunit homolog (Ori»5) 


Phylum Angtospennophyta (Kingdom PUnOae): 


1. ZmCK2: 


Howering plant casein kinase 11, a«ibunit homolog 



CMGC-IV.OkfomUy 
veri^fote: 

l.Ok: 

• 2.Srpkl: 

3. PskOl: 

4. P»kH2: 
Drosophila melanogaster 

• l.Doa: 
Saccharomyces cerevisiae: 

1. Vakl: 

2. Knsl: 
Schixosaccharomyces pombe: 

l.Dskl: 

• 2,Prp4: 

Other CMGC Group kinases 
vertebrate: 

1. Mak: 

2. Ched: 

». PITSLRE: 
4.KKIALRE: 

• 5. PITALRE: 

• 6. PISSLRE: 
Saccharomyces cerevisiae: 

l.Smel: 
2.Sgvl: 

3. Ctkl: 



Phylum Angiospermophyta (Kingdom Plantae): 



"Cdc-Uke kinase" 

Kinase that regulates intracellular localization of splicing factors 
Putative protein kinase 
Puutive protein kinase 

Kinase encoded by "Darkener of Apricot" locus 

Suppressor of RAS mutant 
Nonessential protein kinase homolog 

Disl-suppressing protein kinase implicated in mitotic control 
Pre-mRNA processing gene product; lacks subdomains X-XI 



"Male germ celKassodated kinase" 
"ChoUnesterase-related cell division controller* 
Galactosyltnuuferaseassociated kinase 
Cdc2-related protein 
Cdc2-rclated kinase 
Cdc2-related kinase 

Product of gene essential for start of meiosis 

Kinase required for C-protein-mediated adaptive response to pheromone 
Product of gene required for normal growth 



l.Mhk: 



Arabidopsis thaliana "Mak homologous kinase" 



Conventional Protein-Tyrocine Kinase Group (I-X: Non-membrane^panning; XI-XXIII: Membrane^panning) 
PTK-I. Src family 



vertebrate: 

l.Src: 

2. Ves: 

3. yrk: 

4. Fyn: 

5. Fgn 

6. Lyn: 

7. Hck: 

8. Lck: 

9. Blk: 

• 10. Frk: 

* n.Rak: 

• 12. Fyk: 
Drosophila melanogastm 

1. DmSrc: 

Dugesiai (Cirardia) tigrina (Phylum Platyhelminthes): 

* 1 . DtSpk- 1 : '^rc4ike planarian kinase" 
Hydra vulgaris (Phylum Cnidaria): 

1. Stk: Src-retated protein 

Spongiila lacmtris (Phylum P&rifera): 

1 . Srkl-4: Four distina Src-rdated kinases 



Cellular homolog of Rous sarcoma virus oncoprotein 
Cellular homolog of Yamaguchi 73 sarcoma virus oncoprotein 
Yes-related kinase 
Protein related to Fgr and Yes 

Cellular homolog of Gardner-Rasheed sarcoma virus oncoprotein 
Protein related to Fgr and Yes 
Hematopoietic cell protein-cyrosine kinase 
Lymphoid T<:eO protein-tyrosine kinase 
Lymphoid B<eU protein<yrosine kinase 
Fyn-retated kinase 
STK-related kinase 

"Fyn and Ye»>rcbted kinase" from electric ray 
Src homolog, polytene locus G4B 



FTK-II. Brk famUy 
vertebrate: 
• 1. Brk: 



Protein-tyrosine kinase expressed in human breast tumors 
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ITK-IIl.Tec&mUy 
vtriibrate: 

1. Tcc: 

2. Emt: 

3. Btk: 

* 4.Txk: 
Dnsophila mdanogasUr: 

h DmTec 
PTK-IV. Csk family 
verttbmU: 

l.Csk: 

* 2.MatK: 
FTK-V.FcsCFps) family 

vert^mu: 

1. Fe^Fps: 

2. Fer 
Dnsophila wulanogasUr, 

1. DmFcn 

PTK-VI. Abl family 
verUbraU: 

LAbk 

2. Aiy 
Dmophila melanogaster: 

h DmAb): 
Ciunorhabditis $Ugans: 

1. CeAbI: 

FTK*VII. Syl^Zap70 family 
verUbnUe: 

l.Syk: 

2. Zap70: 
ffyira vutgttris (Phylum Cnidaria): 

* 1. Htkl6: 

FTK-Vni. Jak family 
imUbmU: 

1. Tyk2: 

2. Jakl: 

3. Jak2: 

* 4.Jak3: 
Orosophila m^lanogasUr 

* l.Hop: 

PTK-DCAck 

vtrtehrate: 

* l.Ack: 

PTK-X.Fak 

VfTUbnUe: 

l.Fak: 

FTK-XI. Epidermal growth factor receptor family 
verUbrate: 



Tyrosine kinase expressed in hepatoceUular carcinoma* 
"Expressed mainly in T<eUs' kinase (Itk, Tsk) 
"Bruton's agammaglobulinacmia tyrosine kinase" (Emb) 
Tcc-related proceiiHyrDsine kinase 

Tec homolog. polytene locus 28C 



*C terminal Src Kinase"; negative regulator of Src 
"Megakaryocyteassodated Tyr-kinase" (Hyl. Uk, Cdt, Ntk) 



Cellular homolog of feline and avian sarcoma viruses 
"Fe^ps-related" kinase 

Fcr-related protein 



Cellular homolog of Abelson murine leukemia virus 
"Abl-related gene" product 

Abl^lated protein 

Nematode Abl-related protein 



"Spleen tyrosine kinase" 

T<cll receptor "zeta chain-associated protein of 70 kDa* 
Syk/Zap70-rclaicd 



Transducer of interferon a/fi signals 
"Janus kinase", type 1 
'Janus kinase", 2 
"Janus kinase", type 3 

Product of hopscotch gene required for establishing segmental body plan 



"CDC42Hs4Asociated kinase" 



"Focal adhesion kinase" 



1. EGFK* 

2. ErbBS: 

3. £rbB3: 

4. £rb&4: 
Dmophila melanogaster 

l.DER: 
Caenorhabditis elegans: 
1. LET-23: 



Epidermal growth factor receptor 

Cell homolog of oncogene activated in EhOJ-induced rat neuroblastoma (Neu, HER2) 
Receptor tyrosine kinase related to ECFR (HER3) 
Receptor tyrosine kinase related to ECFR (Tyro2) 



Homolog of EGF receptor 



Schistosoma mansoni (Phylum Piatyhelminthes): 



Product of gene required for normal vulval development 



l.SER: 

PTK XII. Eplv/Elk/Eck receptor family 
vertebrate: 

1. Eph: 

2. Eck: 

3. Eek: 

4. Hek: 

5. Sek: 

6. Elk: 

♦ 7. Hek2: 

♦ 8.Htk: 

9. Cek5/Nuk: 

♦ 10. Ehkl: 

♦ n.Ehk2: 

♦ 12. Mykl: 



EGF receptor homolog 



Kinase detected in 'erythropoeitini>roducing hepatoma" 

"Epithelial cell linasc" 

EpK/Elk-related protein-tyrosine kinase 

^VElk rdated protein-tyrosine kinase (Cek4) 

"Scgmcn tally-expressed kinase" 

'Eph4ike kinase" detected in brain 

"Human embryo kinase" type 2 (CeklO) 

"Hepatoma transmembrane kinase" 

"Chicken embryo kinase 5"/"Neural kinase" 

"Eph homology kinase-l" (Cck7) 

"Eph homology kinase>2" 

"Mammary-derived tyrosine kinase, type 1" 
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"Mammarydoived rynmne kinase, type 2" 
"Chicken embryo kinase 9" 

*Pagitacdo" Xenopus protein expression in neural crest and neural tissues 
Zebrafish EpIv^Etk-related proteuiKyrostnc kinase 
Zebrafish EpV^tk-related procein-^rosine kinase 
Zebrafisb Epfa/£lk-related protdih^rosine kinase 



'Anexelekto" (Cr* "uncontrolled') tyrosine kinase (UFO, Ark) 
Cellular homolog of RPLSO avian oi»coprotein (c4lyk) 

"Brain tyrosine kinaseySea related protein tyrosine kinase'/nyrosine kinase with I^like 
and FN-in-Uke domains'/'Reccptor sectaris" (TyroS) 



13. Myk2: 

14. Cdi9: 

15. Pag: 

16. Rtkl: 

17. Rtk2: 

18. RtkS: 

FTKXUI. Axl family 
verUhraU: 

1. Axl: 

2. Eyk: 

• 3. Brv^Sky/WRse: 

FTK-XIV. Tie/Tek femily 
vertebrate: 

1. Tie: 

2. Tek: 

FTK-XV. Platelet-derived growth factor receptor femily 

A. Sub&mily witih 5 Ig4ike extracellular domains 
veftebmte: 

1. PDGFRo: Platelet-derived growth fector receptor, type alpha 

2. PDGFRP: Platelet-derived growth factor receptor, type beta 

3. CSFIR: Colony^timulating factor-1 receptor (c-Fms) 

4. Kit: Steel growth fector receptor 

5. Flk2: Tetal Uver kina5e-2" (RtS) 

B. Sub&mily with 7 Ig-Uke extracellular domaitu 
vertebrate: 



"Tyrosine kinase with Ig and £GF homology" 
"Tunica interna endothdial cdl kinase" CnE2) 



1. Fitl 

2. F1t4: 

3. Flkl 



"Fm^^ike tyrosine kinase", type 1 
*Fin»4ike tyrosine kinase", type 4 
"Fetal liver kinase^l" (KDR) 



PTK-XVI. Fibroblast growth foctor receptor fomily 



vertebrate: 

1. FGFKl: 

2. FGFR2: 

3. FGFRS: 

4. FGFR4: 
Drosophila melanogaster: 

1. DmFGFRl: 

♦ 2. DmFCFR2: 

FTK-XVII. Insulin receptor family 
vertebrate: 

l.InsR: 

2. IGFIR: 

3. IRR: 
Drosophiia meianogaster: 

1. DmInsR: 

FTK-XVIU. LtK/Alk family 
vert^rate: 

I.Ltk: 

♦ 2.Alk: 

PTK-XDC. Ro3/Sev family 
vertebrate: 

1. Ros: 
Drosophila melanogaster: 

l.Sev: 

PTK-XX. Tri;/Ror famUy 
vertebrate: 

1. Trti: 

2. TMi 

3. TrkC: 

4. Rorl: 

5. Roi^: 

6. TcRTK: 
Drosophila wutani^aster: 

♦ l.Dror 

PTKXXI. Ddr/TVt family 

♦ l.Ddn 

♦ 2.Tkt: 



Fibroblast growth foctor receptor, type 1 (Fig, Cdil) 
Fibrobtest growth factor receptor, type 2 (Bek, K-SAM. Cek3) 
Fibroblast growth factor receptor, type 3 
Fibroblast growth factor receptor, t^ 4 

Fibroblast growth factor receptor homolog, type 1 
Fibroblast growth factor receptor homolog, type 2 



Insulin receptor 

Insulin-like growth factor receptor 
Insulin receptor-related protein 

Homolog of insulin receptor 



"Leukocyte tyrosine kinase 
"Anaplastic lymphoma kinase 



Cellular homolog of UR2 avian sarcoma virus oncoprotein 

Product of sevenless gene required for R7 photoreceptor cell development 



High molecular weight nerve growth factor receptor 

Receptor for nrairvderived neurotrophic factor and neurotrophii>4/5 

Trk-related protein; receptor for neurotrophirv3 

■Ror" putative receptor, type 1 

•Ror* putative receptor, type 2 

Trk-rclated receptor (electric ray) 

Putadve neurotrophic receptor 

"Discoidin Domain Receptor' (TriiE, CAK. NEP. Ptk3) 
"Tyrosine Kinase Related to Tri' (Tyro 10) 
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FTK-XXn. HqNUocyte growth &ctor receptor &xnily 

vtrtebrate: 

1. HGFR: Hepatocyte growth factor receptor (MET) 

2. Sea: CeDuhr homolog of SIS avian crythroleukemia virus oncoprotein 

3. Ron: "Recepteur d'Origine Nantaise* 

* 4. Stk: "Stem ceO-derived tyrosine kinase" 

FTK-XXm. Nematode Kinl5/16 fomily 
Caenorfiabditis eiegans: 

1. CeKinlS: 

2. CeKinl6: 

Other membrane^>annii^ protein<yrosine kinases (each with no dose relatives) 



FTK expressed during hypodermal development 
FIX expressed during hypodermal development 



vtrUhrate: 

1. Ret: 

2. Klg: 

• 3. Nyk/Ryk: 
DrosopkUa meianogasUr: 

1. Torso: 

2. DmTrk: 

Marine spongi (Geodia cydtmium): 

• l.GCTK: 



Normal homolog of oncoprotein activated by recombination 
"Kinase-tike gene" product 

•Novel tyrosine kinase-related protein" (VIK, Mrk. Nbtkl) 

Product of tcrso gene required for embryonic anterior/posterior determination 
Distant relative of the mammalian trk gene 

Putative receptor PTK 



Otlicr ptotebi Unaac ^""^ (not felling into m ^or groups) 



O-I. Polo femily 
vertebrate: 

1. Plk: 

2. Snk: 

• 3.Sak: 
Drosophila mehrtogaster 

1. Polo: 
Sacehartnmyees eerevisiae: 
l.Cdc5: 

O-n. MEVSTE7 fiunily 
vertetraie: 

1. MEKl: 

2. Nf£K2: 
Drosophila melanogaster 

1. Dsorl: 
SacdtaroMyces cerevitutet 

1. Ste7: 

2. Pbs2: 

3. Mkkl: 

4. Kfkk2: 
Schixosaccharomyces pombe: 

1. Byrl: 

2. Wisl: 

O-ni. MEXK/Stell Cunily 
vertebrate: 

* 1. MEKK: 
Saecharomyces cerevisute: 

l.Stell: 
2. Bckl: 
Schixouiccharomyces pombe: 
l.Byi2: 



"Polo^e kinase" 
"Serum-indudble kinase" 

PoloHPelated kinase isolated in screen for genes regulating sialylation 
Protein kinase homolog required for mitosis 
Product of gene required for cell cycle progression 



"MAP ERK Kinase", type 1 
"MAP ERK Kinase", type 2 



Kinase required for haploid-specific gene expression 

Kinase required for antibiotic drug resistance 

"MAP Kiiuue Kinase", type 1 (suppresses lysis defect of pkcl mutant) 

"MAP Kinase Kinase", type 2 (suppresses lysis defect of pkcl mutant) 

Kinase that suppresses rasl-mutant sporulation defect 

Suppressor of cdc phenotype in triple mutant cdc25/we*l/winl strains 



"MEK Kinase" 

Protein required for ceU-type4pecific transcription 
"Bypass of C kinase" kinase 



Phylum Angiospermophyta (Kingdom Piantae): 



Product of gene required for pheromone signal transduction 



1. NPKl: 

aiV. Pak/Ste20 family 
vertebrate: 

• l.Pak: 
Sauharomyees eerevisiae: 

1. Ste20: 

O-V. NimAfiunily 
vertebrate: 

1. Nekl: 

• 2. Nck2: 

• 3. Nek3: 

• 4. Nrk2: 

• 5.Stkl: 
Aspergillus rUduianv 

1. NIMA: 
Drosopfula mehrwgaster: 
1. Fused: 



Flowering plant (tobacco) homolog of Bckl 

"p21-(Cdc4VRac) activated kinase' 

Product of gene required for pheromone response 

NimA-rclated kinase. 
NimA-related kinase (Nlkl) 
NimA-related kinase 
NimA-related kinase 
NimA-related kinase 

Cell cycle control protein kinase 

Product of gene required for segment polarity 
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Tfypanosoma brucei (Hiylum Zootnastigma, Kingdom Protoctista): 



l.NrtA: 
Saccharomyces cemnsau: 
MUn3: 

OVI. weel/mikl family 

1. WeelHu: 
Sacdusromyces cerevisiae: 
• l.Swel: 
Schkosaccharomyces pombe: 

l.SpWeel: 

2. Mikl: 



Trypanosome protein kinase related to NimA 
Putative protein kinase 

Gene product able to complement S. pombe weel mutant 

Weel homolpg from budding yeast 

■Wee" size at division kinase; Cdc2 negative regulator 
"Mitosis inhibitory kinase", negative regulator of Cdc2 



OVIl. Family of kinases involved in translational control 
xiertdmOe: 



1. HRI: 

2. PKR: 
Sacdiaromyces arevisiae: 

l.Gcn2: 

O-VIII.Raf family 
verU^U: 

1. Raf-1: 

2. A-Raf: 

3. B^Raf: 
DrosophUa melanogasUr 

I. DmRaf: 
Caenorhabditis eUgans: 
1. CeRaf: 



"Heme^regulated eukaiyodc initiation &ctor 2a kinase" 
"Double4tranded RNA-dependent kinase" (Tik) 

Protein required for transladonal derepression 



Cellular homolog of retroviral oncogene produa 
Oncogenic protein closely related to c-Raf 
Oncogenic protein closely related to c-Raf 

Raf homolog 



Phylum Angiospemophyta (Kingdom Plantae): 



Raf homolog; produa of tin'45 gene required for vulval differentiadon 



I.Ctrl: 

O-DC ActiviiyTCFp receptor fomily 

A. Subfiunily of type I receptors 
vertebrate: 

1. ActR-l: 

♦ 2. TSR-l: 

• 3. TGFpRI: 

• 4. ActR-IB: 

* 5. BRK-1: 

* 6. ALK^: 
Dmophila melanogaster 

* 1. DmAtr-I: 

* 2. DmSax: 

B. Subfamily of type II receptors 
Xfertebrate: 

1. ActRlI: 

2. ActRIIB: 

3. TGFpRII: 

♦ 4.C14: 
Dmophila melanogaster 

• 1. DmAtr-II: 
Caenorhtibditis elegans: 

♦ l.DAF-4: 

C. Others 
Caenorhabditis elegam: 

1. DAF l: 



Negative regulator of ethylene response pathway 



Type I receptor for acthin and TGF-ft (TskTL, SKRl, ALK-2) 

Type I receptor for acdvin and TCFG-P (ALK-1) 

Type I receptor TGF- (ALK-S) 

Type I receptor for acdvin (ALK-4) 

Type I receptor for BMP-2 and BMP4 (ALK-3) 

"Activin receptor-like kinase", type 6 

Type I activin receptor homolog 
Produa of saxophone gene 



Type II receptor for activin 

Type II receptor for activin 

Type II receptor TGF-§ 

Putative receptor kinase expressed in gonads 

Type II activin receptor homolog 

Larva development regulatory protein; BMP receptor 

Produa of gene required for vulval development 



O-X. Flowering plant putadve receptor kinase fiunily 
Phylum Angiospermophyta (Kingdom Plantae): 

1 . ^PK 1 : Putadve receptor proteirKserine kinase (maize) 

2. Srk: "S receptor kinase"; direc disdnci alleles: 2, 6. and 910 (Brassica) 

3. Tmkl: Putative "Transmembrane receptor kinase" (Arabidopsis) 

4. Apk I : Kinase that phosphorylates Tyr, Scr. and Thr (Arabidopsb) 

* 5. Nak: "Novel Arabidopsb Kinase" (Arabidopsis) 

6. Pro25: Putadve kinase selected for speciBdty to thylakotd membrane protein (Arabidopsis) 

* 7. Pto: Product of genen conferring pathogen resistance (tomato) 

* 8. Tmkl 1: Transmembrane protein wids imusual kinase-Uke domain (Arabidopsis) 

* 9. Prkl: PoUervexpressed reccptor4ike putative kinase (Petunia) 

O-XI. Family of "mixed-lineage" kinases with leucine zipper domain 
vertebrate: 



1. Mlkl 

2. Mlk2; 

3. Mlk3: 



"Mixed lineage kinase", type 1 
"Mixed lineage kinase", type 2 
"Mbced lineage kinase", type 3 (FTKl, SPRK) 
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O-Xn. Casein kinase I fiunily 
verUbraU: 

1. CRla: 

2. CKlp: 

3. CKlT 

4. CK18: 
Saaharomycis eeftvisiae: 

1, Yckl: 

2. Yck2: 

5. Hn«5: 
SchhosacchoTowyca pombt: 

• 1. Hhpl: 

* 2. Hhp2: 



Casein kinase I, type alpha 
Casein kinase I, type beta 
Casein kinase I, type ganuna 
Casein kinase I, type delta 

Budding yeast casein kinase I homolog, type 1 
Budding yeast casein kinase I homolog, type 2 
Kinase required for DNA repair 

Fission yeast casein kinase I homolog, type 1 
Fission yeast casein kinase I homolog. type 2 



O-Xni. PKN family of prokaryotic protein kinases 

Myxoeoccus mnAus (Phylum MyeobacUria: KingdoM Frokaryotoi): 

1. Pknl: Protein kinase homologous to eukaiyotic kinases 

2. Pkn2: Protein kinase required for maintenance of stationary phase ceUs and development 

Other protein kinase fomily members (eadi with no known dose relatives) 
vertebraU: 



1. Mos: 

2. Piml: 

3. Cot: 

4. Esk: 

* 5. GC kinase: 

• 6.Slk: 

♦ 7.LIMK: 

♦ 8.Tskl: 
Droaophiia wulanogaster 

1. NinaC: 

2. PeUe: 

♦ 3. Nemo: 
DictyosUlium discoidntm: 

1. SplA: 

2. Dpyk2: 
Cemtodcn fuipuma: (a moss) 

l.PhyCen 
Stucharomyc$$ C0Twvisitu: 

1. Cdc7: 

2. CDC15: 

3. VpslS: 

4. Nprl: 

5. Elml: 

6. Irel: 

7. Ykl516: 

* 8. Ipll: 
Schixosacckannifycts pomW: 

1. Ranl: 

2. Chkl: 

♦ 3. Cskl: 

• 4. RPKl: 



Cellular homolog of retroviral oncogene produa 
Protooncogene activated by murine leukemia virus 
Produa of oncogene expressed in human thyroid carcinoma 
"Embryonal carcinoma SlY kinase"; dual spedftdty (PIT) 
Kinase expressed in germinal center B ceUs 
STE2(Vfelated kinase 
UM motif-containing kinase" 
'Testi»«pedfic kinase" 

Produa of gene essential for photoreceptor funcdon 

Produa of gene required for dorsatventral polarity 

Produa of gene required for rotation of photoreceptor clusters 

Spore lysb A protein kinase 
Developmentally-reguated tyrosine kinase, type 2 

Putative protein-tyrosine kinase encoded by a phytochrome gene 

"Cclkiivision<ycle" control gene produa 

"CeU-division-cycle" control gene produa 

Produa of gene essential for sorting to lysosome-like vacuole 

Produa of gene required for activity of ammoni^sensttive amino add permeases 

Product of gene required for yeast-like cell morphology 

Required for Myoinositol synthesis and signaling from £R to the nucleus 

Putative protdn kinase gene on chromosome XI 

Produa of gene required for chromosome segregation 



produa of gene required for normal meiotic function 
"Checkpoint Kinase' that links rad pathvray to Cdc2 
"Cydin Suppressing Kinase" 
"Regulatory cell proliferation kinase* 
Entamoeba histoiytka (Phyhim Rhiiopoda, Kingiom Protoctista): 

1 . Ehmfli 1 : Distant relative of Mos 

Phyhm Anguupemophyta (Kingdom Ptantae): 

1. GmPKfi: Protein kinase homolog (soybean) 

* 2. Tsl: Produa of TousUd gene required for normal leai/flower de\xlopmem (Aiabidopsis) 

Yminia fistudotuberatlosis (Phylum Ommbaderia, Kingdom Prokaryotae): 

1. YpkA: Enterobaaerial protein kinase essential for virulence 



known primary structures. The kinase domains are fur- 
ther divided into 12 smaller subdomains (indicated by 
Roman numerals), defined as regions never interrupted 
by bulge amino add insertions and containing charac- 
teristic patterns of conserved residues (consensus line in 
Fig.1). 

Twelve kinase domain residues are recosnized as being 
invariant or nearly invariant throughout me super&mily 
(conserved in over 95% of S70 sequences), and hence 
strongly implicated as playing essential roles in enzyme 



funcdon. Using the type a cAMP-dependent protein ki- 
nase catalytic subunit (PKA-Ca) as a reference point, 
these are equivalent to GlySO and Gly52 in subdomain I, 
Ly572 in subdomain n, Glu91 in subdomain III, Aspl66 
and Asnl71 in subdomain VEB, Aspl84 and Glvl86 in 
subdomain VU, Glu208 in subdomam VIII, Asp220 and 
GI^5 in subdomain DC, and Ar^80 in subdomain XI, 

The patterns of amino add residues found within sub- 
domains VIB, VIII, and IX have been particularly 
well-conserved among the individual members of the dif- 
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Figure 1. Multiple alignments of 60 kinase domains representative of members of the eukaryotic protein kinase superfamily. The 
abbreviated names used are as defmed in Table 1. The single letter amino acid code is used and gaps are indicated by dashes. The 
entire sequences for the larger inserts are not shown, but excluded residues are indicated as numbers in brackets. Twelve distinct 
subdomains are indicated by Roman numerals. The consensus line is given according to the following code: uppercase letters, invariant 
residues, lowercase residues neariy invariant residues; o, positions conserving honpolar residues; *, positions conserving polar 
residues; positions conserving small residues with near neutral polarity. Residues corresponding to the numbered ^-strands (b) 
and a-helices (a) in PKA-Ca are indicated in the 2 * structure line. 



ferent protein kinase families and these motifs have been 
targeted most frequently in PCR-based homology clon- 
ing strategies aimed at identifying new family members. 

Relationship between conserved subdomaiiis» hi^er 
order structure^ and catalytic mechanism 

The homologous nature of the kinase domains implies 
that they all fold into topologically similar 3-dimensional 
core structures and impart pnosphotransfer according to 
a common mechanism. The larger inserts found within 
some kinase domains are likely to represent surface ele- 
ments that do not disrupt the basic core structure. With 
the solution of the crystal structure of mouse PKA-Ca, in 
a binary complex with a pseudosubstrate peptide inhibi- 
tor (PKI 5-24; TTYADFIASGRTGRRNAIHD, the under- 
lined Ala substituting for the Ser phosphoacceptor), the 
general topology of a protein kinase catalytic core struc- 



ture was revealed for the first lime (25, 26). Later, struc- 
tures of ternary complexes of PKA-Go. the 
pseudosubstrate inhibitor, and either MgATP or 
MnAMP-PNP (an MgATP analog) were solved (27, 28). 
As a consequence of these studies, precise functional 
roles for most of the highly conserved kinase domain 
residues have now been assigned. 

The kinase donuun of PKA-Ca folds into a two-lobed 
structure (Fig. 2). The smaUer, NH2- terminal lobe, which 
includes subdomains I-IV, is primarily involved in an- 
choring and orienting the nucleotide. This lobe has a 
predominantly antiparallel p-sheet structure that is 
unique among nucleotide binding proteins. The larger 
COOH-terminal lobe, which includes subdomains 
VIA-XI, IS largely responsible for binding the peptide 
substrate and initiating phosphotransfer. It is predomi- 
nandy a-helical in content. Subdomain V residues span 
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Figure 1 (contd.). 



the two lobes. The deep cleft between the two lobes is 
recognized as the site of catalysis. The crystal structures 
of four additional eukaryotic protein kinase superfamily 
raembers^din-dependent kinase 2 (Cdk2) (29). p42 
MAP kinase (Eiit2) (SO), twitchin kinase (31). and casein 
kinase I (32>-have been reported more recently, and as 
expected* their kinase domains were found to fold into 
two-lobed structures topologically very similar to the 
catalydc core of PKA-Cct. Notable differences, however, 
were found in the regions corresponding to subdomain 
Vni in the Cdk2 and Erk2 structures, apparendy reflect- 
ing the fact that these are structures of enzymes in an 
inactive state (see below). The twitchin structure is also of 
an inactive enzyme, but in this case it is inactive due to 
the presence of an autoinhibitory peptide sequence, 
which lies on the COOH-terminal side of the kinase do- 
main and folds back into the active site cleft between the 
two lobes (31). This peptide apparendy forces the two 



lobes to rotate almost 30** with respect to one another, 
and in this configuration inactive twitchin b more similar 
to the open configuration of PKA-Ca without PKI (33). 
In botii twitchin and Cdk2 the a-helix C in subdomain 
III also adopts a different position to that of helbc C in 
PKA-Ca. Unfortunately, no structure of a protein-tyro- 
sine kinase catalytic domain was available at the time of 
writing (see "Note added in prooP), but the ease with 
which it has been possible to model the kinase domain of 
the EGF receptor protein-tyrosine kinase on to that of 
the PKA-Ca emphasizes that the structure of the pro- 
tein-tyrosine kinases will be similar to that of the pro- 
tein-serine kinases (34) 

The conserved kinase subdomains correspond quite 
well to precise units of higher order structure. The func- 
tions of the individual subdomains will be discussed 
briefly later on a subdomain-by-subdomain basis, mak- 
ing reference to the crystal structiu-e of PKA-Ca and 
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Figure 1 (contd.)- 



drawing attention to the proposed roles of the nearly 
invariant amino add residues (25-27, 28) and other resi- 
dues of interest. For more detailed information, the 
reader is referred to recent reviews on the structure of 
PKA-Ca (35-37) and to an excellent comparative review 
of die structures of PKA-Ca, Erk2, and Cdk2 (38). 

Subdomain I, at the NHs terminus of the kinase do- 
main, contains the consensus motif Gly-x-Gly-x-x-Gly- 
x-Val (starting with Gly50 in PKA-Ca). The kinase do- 
main NH2-ierminal boundary occurs seven positions up- 
stream of the first glycine in the consensus, where a 
hydrophobic residue is usually found. Subdomain I resi- 
dues fold into a ^-strand-tum-P-strand structure encom- 

Sassing ^-strands 1 and 2, and this structure acts as a 
exible flap or damp that covers and anchors the non- 
transferable phosphates of ATP. The backbone amides of 
Ser53, Phe54, and Gly55 form hydrogen bonds with ATP 
^ phosphate oxygens. Leu49 and Val57 contribute to a 
hydrophobic pocket that encloses the adenine ring of 
ATP. 



Subdomain II contains the invariant Lys (Lys72 in 
PKA-Ca), which has long been recognized as being essen- 
tial for maximal enzyme activity. This Lys lies within P- 
strand 3 of the small lobe, and helps anchor and orient 
ATP by interacting with the a- and P- phosphates. In 
addition, Lys72 forms a salt bridge with the carboxyl 
group of the nearly invariant Glu91 in subdomain in. 
Ala70 contributes to the hydrophobic adenine ring 
pocket In PKA-Ca, P-strand 3 is followed immediatdy 
by a-helix B, which, judging from the sequence align- 
ment, appears to be quite a variable structure among the 
protein lunases. Indeed, this a- helix is absent in the 
Cdk2 and Eric2 crystal structures. 

Subdomain HI represents the large a- helix C in the 
small lobe. The nearly invariant Glu residue (Glu91 in 
PKA-Ca) is centrally located in this helix and helps stabi- 
lize the interactions between Lys72 and the a- and P- 
phosphates of ATP. Subdomain IV corresponds to the 
hydrophobic P-strand 4 in the small lobe. This subdo- 
main contains no invariant or nearly invariant residues 
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Figures. Ribbon diagram ofthe catalytic core of PKAa (residues 
40-300) in a ternary complex with MgATP and pseudosubstrate 
peptide inhibitor (PKl -5-24). Invariant or neariy-invariant resi* 
dues (GlySO. GIy52, Gly55, Lys72, GIu91. Aspl66, Asnl71» 
Aspl84, Glu208, Asp220, and Aig280) are indicated by dots along 
the ribbon diagram. Side chains are shown for Lys72, A5pl66. 
Asnl7U Aspld4, Glu208» and Arg280. ^-strands and a-helices 
are indicated by flat arrow and helices, respeaively, and are 
numbered according to Knighton et al. (26). The small arrow 
indicates the site of phosphotransfer with the Ala in PKI substi- 
tuting for the phosphoacceptor Ser in the true substrate. (Repro- 
duced, with permission, from Taylor et al. (36)). 



and does not appear to be directly involved in catalysis or 
substrate recognition. 

Subdomain V links the small and large lobes of the 
catalytic subunit and consists of the very hydrophobic 
P-strand 5 in the small lobe, the small a-helbc D in the 
large lobe, and an extended chain that connects them. 
Three residues in the connecting chain of PKA-Ca, 
Glul21, Vall23. and Glul27 help anchor ATP by forming 
hydrogen bonds with either the adenine or tne ribose 
ring. Mell20, Tyrl22, and Vall23 contribute to the hy- 
drophobic pocket surrounding the adenine ring. Glul27 
also participates in peptide binding by forming an ion 
pair with an Arg in the pseudosubstrate site of the PKA 
mhibitor peptide. This represents the first Arg in the PKA 
substrate recognition consensus Arg-Arg-x-Ser*-Hydro- 
phobic 

Subdomain VIA folds into the large hydrophobic a-he- 
lix £ that extends through the large lobe. None of the 



residues in helix £ appear to interact direcdy with either 
MgATP or peptide substrate; hence this part of the mole- 
ciue appears to act mainly as a support structure. Subdo- 
main \nB folds into the small hydrophobic P-strands 6 
and 7 with ah intervening loop. Included here are two 
invariant residues (Aspl66 and Asnl71 in PKA-Ca) that 
lie within the consensus motif His-Arg-Asp-Leu-Lys- 
x-x-Asn (HRDLKxxN). The loop has been termed the 
catalytic loop because Asp 166 within the loop has- 
emerged as the likely candidate for the catalytic base, 
accepting the proton from the attacking substrate hy- 
droxyl group during an in- line phosphotransfer mecha- 
nism. Lysl68 in the loop (substituted by Arg in the 
conventional protein-tyrosine kinases) may help facilitate 
phosphotransfer by neutralizing the negative ch3x^e of 
the Y-phosphate during transfer. The side cham of 
Asnl71 helps to stabilize the catalytic loop through hydro- 
gen bonding to the backbone carboiwl of Asp 166 and 
also acts to chelate the secondary Mg** ion that bridges 
the a- and y-phosphates of the ATP. The carbonyl group 
of Glul70 forms a hydrogen bond with an ATP ribose 
hy droxyl group. Glul70 also participates in substrate 
binding by forming an ion pair with tiiie second arginine 
of the peptide recognition consensus. 

Subdomain VII folds into a P-strand-loop-b-sirand 
structure, encompassing ^-strands 8 and 9. The highly 
conserved DFG triplet, corresponding to Aspl84- 
Phel85-GIyl86 in PKA-Ca, lies in the loop that is stabi- 
lized by a hydrogen bond between Asp 184 and Glyl86. 
Aspl84 chelates the primary activating Mg** ions that 
bridge the P- and y-phosphates of the ATP, and thereby 
helps to orient the y-phosphate for transfer. In Cdk2, 
p-strand 9 is replaced with a small a-helbc designated 
aL12. However, it is undear whether this helical charac- 
ter is maintained when Cdk2 is in its active conformation. 

Subdomain VIII, which includes the highly conserved 
Ala-Pro-Glu (•AP£') motif (residues 206-208 in 
PKA-Ca), folds into a tortuous chain that faces the cleft. 
Residues lying 7-10 positions immediately upstream of 
the APE motif are characteristically well-conserved 
among the members of different protein kinase families. 
The nearly invariant Glu corresponding to PKA-Ca 
Glu208 forms an ion pair with an invariant Arg (Ai^80 
in PKA-Ca) in subdomain XI, thereby helping to stabilize 
the large lobe. 

Subdomain VIII appears to play a major role in recog- 
nition of peptide substrates. Several PKA-Ca subdomain 
VIII residues participate in binding the pseudosubstrate 
inhibitor peptide. Leul98, CysI99, Pro202, and Leu205 
of PKA-Ca provide a hydrophobic pocket that accommo- 
dates the side chain of the hydrophobic residue at posi- 
tion +1 of the substrate consensus (lie for the inhibitor 
peptide). Gly200 forms a hydrogen bond with the same 
he residue. Glu203 forms two ion pairs with the Arg in 
the high-affinity binding region of the inhibitor peptide. 

Many protein kinases are known to be activated by 
phosphorylation of residues in subdomain VIII. In 
PKA-Ca, maximal kinase activity requires phosphoryla- 
tion of Thrl97, probablv occurring through an intermo- 
lecular autophosphorylation mechanism (39). In the 
crystal structure, phosphate oxygens of phospho-Thrl97 
form hydrogen bonds with the charged side chains of 
Argl65, Ly5l89, and die hydroxyl group of Thrl95, and 
thereby may act to stabilize the subdomain VIII loop in 
an active conformation permitting proper orientation of 
the substrate peptide. For members of the Erk (MAP) 
kinase family, phosphorylation of both a Thr and a Tyr 
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residue in subdomain VIII (mediated by members of the 
MEK kinase family) is reouired for activation. In the crys- 
tal structure determined tor £rk2, these residues (ThrlSS 
and TyrI85) were not phosphorylated and thus the en- 
zyme was in an inactive state (unlike the PKA-Ca struc- 
ture). The unphosphorylated TyrI85 is buried in a 
hydrophobic pocket, and interactions with TyrlSS are 
apparendy required to hold the enzyme in the inactive 
state. Mutation of Tyrl85, however, does not activate the 
enzyme, and so phosphorylation of TyrlSS must also play 
a role in activation. Unphosphorylated £rk2 appears to be 
inactive because residues required for catalysis are not 
properly oriented, and because its confomnation results 
m a partial steric block to substrate binding. During acti- 
vation of Erk2, Tyrl85 phosphoiylation precedes ThrI8S 
phosphorylation; therefore, binding of MEK to £rk2 may 
alter the conformation of the subdomain VIII loop, 
thereby exposing Tyrl85 for phosphorylation by MEK. 
Interaction of phospho-Tyrl85 with surface residues 
would then allow the subdomain Vin loop to adopt the 
active conformation (30). Subsequent phosphorylation of 
the exposed Thrl83 may activate the enzyme fiilly by 
promoting correct alignment of the catalytic resiaues. 
From the crystal structure of Cdk2, likewise in an inactive 
unphosphorylated state, the subdomain VIII loop appears 
to be in a conformation that would inhibit enzyme activity 
by sterically blocking the presumed protein substrate 
binding cleft (29). Phosphorylation of ThrlSO in die Cdk2 
subdomain VIII, mediated by MO 15 (GAK), presimoably 
would act to remove this inhibition by stabilizing the loop 
in an active conformation similar to that found in 
PKA-Ca. Cydin binding to the NH2-terminal lobe is also 
needed to activate Cdk2, and this may cause rotation of 
the NH2-terminal domain resulting in correa aligrunent 
of catalytic residues. 

Subdomain IX corresponds to the large a- helix F of 
the large lobe. The nearly invariant Asp corresponding to 
PKA-Ca Asp220 lies in the NH2-terminal region of this 
helix and acts to stabilize the catalytic loop by hydrogen 
bonding to the backbone amides of Argl65 and Tyrl64 
that precede the loop. Clu230 of PKA-Ca forms an ion 
pair with the second Arg of the peptide recognition con- 
sensus. PKA-Ca residues 235-239 are all involved in hy- 
drophobic interactions with the inhibitor peptide. 

Subdomain X is the most poorly conserved subdomain 
and its function is obscure. In the crystal structure of 
PKA-Ca, it corresponds to the small a-heluc G that occu- 

f>ies the base of the large lobe. Members of the Cdk, Erk 
MAP), GSK3, and Clk kinase families (the C-M-G-C 
group) all have rather large insertions between subdo- 
mains X and XI, whose functional significance is presentiy 
unclear. Subdomain XI extends to the COOH-ierminal 
end of the kinase domain. The most notable feature here 
is the nearly invariant Arg corresponding to Arg280 in 
PKA-Ca, which lies between a-helices H and I. The 
COOH-terminal boundary of the kinase domain is still 
poorly defmed. For many protein-serine kinases, the corir 
sensus motif His-x-Aromatic-Hydrophobic is found be- 
ginning 9-13 residues downstream of the invariant Arg. 
For protein-tyrosine kinases, a hydrophobic amino acid 
lying 10 positions downstream of the invariant Arg ap- 
pears to define the COOH-terminal boundary. 

The amphipathic a-helix A of PKA-Ca (residues 
15-35; not shown in Fig. 2), though lying outside of the 
conserved catalytic core on the NHj-termiiud side, ap- 
pears to be an important feature found in many protein 



kinases (40). This helix spans the surface of both lobes of 
the core structure and complements and stabilizes the 
hydrophobic cleft between tne two lobes. The A-helix 
motif q>pears to be present in many other protein kinases 
including members of the protein kinase C family and the 
Src family of protein-tyrosine kinases (40). 

CLASSmCATION OF EUKARYOTIC PROTEIN 
KINASES 

To facilitate analysis and management of this large super- 
family we have devised the classification scheme shown in 
Table 1, which subdivides the known members of the 
eukaryotic protein kinase superfamily into distinct fami- 
lies that share basic structural and fimctional properties. 
PhyloRenetic trees derived firom an alignment of kinase 
domam amino acid sequences (essentially an e3q>anded 
version of Fig. 1) served as the basis for this classification. 
Thus, the sole consideration was similarity in kinase do- 
main amino add sequence. When considered alone, how- 
ever, this property has been a good indicator of other 
characteristics held in common by the different members 
of the family. 

Protein kinases whose entire kinase domain amino add 
sequence had been published by July 1993 were induded 
in ph)iogenetic analysis (as well as a few others made 
available at that time through sequence databases). If a 
given kinase domain sequence had been determined firom 
more than one spedes among the vertebrates (i.e., or- 
thologous gene products), only one representative (usu- 
ally human) was induded in the analysis. This policy was 
not used for the other phyla, however, because of greater 
divergences between the species and, hence, the se- 
quences. The kinase domain phylogenies were inferred 
using the principle of maximum parsimony according to 
the PAUP software package developed by Swoflford (41). 
Minimum-length trees were found usin^ PAUFs 'heuris- 
tic' search method with branch swappmg by the 'tree 
bisection-reconnection' strategy. Equal weights were 
given for all amino acid substitutions. Because multiple 
minimum-length trees were found, a consensus tree was 
calculated according to the method of Adams (dted in ref 
41) in order to show branching ambiguities. 

To accommodate the large numbers of sequences, it 
was necessary to construct five separate trees. Initially, a 
skeleton tree of 99 kinases was obtained (Rg. SA). The 
skeleton tree included only representative members firom 
each of four large groups of protein kinases, each consist- 
ing of multiple related families known from previous 
work to duster together in the tree. These four groups 
are designated: /j the AGC group, which indudes the 
cydic-nudeotide-dependent family (PKA and PKfi), the 
protein kinase C (PKfi) family, the P-adrenergic receptor 
kinase (^ARK) family, the ribosomal S6 kinase family, and 
other dose relatives; 2) the CaMK group, which indudes 
the family of protein kinases regulated by caldum/cal- 
modulin, the Snfl/AMPK familv, and other dose rela- 
tives; 3) the CMGC group, which indudes the family of 
cyclin-dependent kinases, the Erk (MAP) kinase family, 
the ^ycogen synthase 3 (GSK3) fisunily, the casein kinase 
II family, the Clk (Cdk-like kinase) family, and other dose 
relatives; and ^) the 'conventioral' protein-tyrosine ki- 
nase (PTK) group. Separate trees (Fig. SB-£) were later 
obtained for each of the four laige kinase groups, and 
contain all members of the groups whose sequences were 
available at the time of analysis. 
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Rj^nre S. Phylogenedc trees of the eukaryotic protein kinase 
super^unily inferred finom kinase domain amino acid sequence 
alignments. The abbreviated nomenclature is the same used in 
Table 1. A) 'Skeleton* tree showing 99 protein kinases. Positions 
of 4 clusters (AGC, CaMK, CMGQ and FTK) containing protein 
kinases representative of larger groups are indicated in the skele- 
ton tree. B) AGC group tree of 59 protein kinases including PRA, 
PRC, and PRC and other dose rclathres. C) CaMR group tree of 
35 protein kinases including the caldunv^calmodulin-regulated 
enzymes. D) CMGC group tree of 59 protein kinases induding 
the cydin-dependent kinases. E) PTR group tree of 90 conven- 
tional protein-tyrosine kinases. Tree A is unrooted and drawn 
with Pknl and Pkn2 as outgroups. Outgroups of two or more 
distantly related protein kinases (not shown) were induded in the 
analysis of trees B*E to provide a rooting point Asterisks (*) in 
all trees indicate branches leading to defined protein kinase 
families listed in Table 1. Branch lengths indicate number of 
amino add substitutions required to reach hypothetical common 
ancestors at internal nodes. 



It can be reasonably surmised that the protein kinases 
having closely related catalytic domains, and thus defining 
a family, represent products of genes that have under- 
gone relatively recent evolutionary separations. Given 
this, it should come as no surpnse that members of a 
given family tend also to share related functions. This is 
manifest by similarities in overall structural topology, 
mode of regulation, and substrate specifidty. The det^ 
of the common properties exhibited by the members of 
the various kinase families can best be gleaned from 
studying the information outlined in the individual en- 
tries section of the Protein Kinase Factsbook (42). Some of 
the most salient relationships are discussed below. 

The AGC group protein kinases tend to be basic amino 
acid-directed enzymes, phosphorylating substrates at 
Ser/Thr residues lying very near Arg and Lys. For the 
cydic nucleotide-dependent and ribosomal S6 kinase 
families, the preferred substrates have basic residues lying 
in spedfic positions NHs-terminal to the phosphate ac- 
ceptor. Preferred substrates for the PRC and RAC fsuni- 
lies have basic residues on both the NH2- and COOH- 
terminal sides of the acceptor (43). The G-protein-cou- 
pled receptor kinases (PARR and RhR) appear to break 
this rule, however, as they are reported to prefer synthetic 
peptide substrate residues located within an acidic envi- 
ronment. Little substrate information is available for the 
other families in this group. 



The (DaMR group protein kinases also tend to be basic 
amino add- du^cted, and in this regard it is notable that 
the AGC and CaMR groups fall near one another in the 
phylogenetic tree. CaMRl, CaMR2, CaMR4, MLCK. 
CDPR, and AMPR are all reported to prefer substrates 
with hdsic residues at spedfic positions NHs-terminal to 
the acceptor site, whereas EF2R and PhK prefer sites with 
basic residues at both NH2- and COOH-terminal loca- 
tions. Many, but not aD, of the CaMK group protein 
kinases are known to be activated by Ca^yddmodulin 
binding to a small domain located just COOH-terminal 
to the catalytic domain, e.g., CaMRl, CaMR2, CaMR4, 
PhRy, MLCR, and twitchin. These enzymes and their 
dose relatives are grouped together in a large family 
within the CaMR group. Also included in this fiamily are 
a subfamily of plant enzymes (represented by CDPR) that 
contain an intrinsic calmodulin-like domain that confers 
Ca**-dependent activation. The other family within the 
CaMR group is the Snfl/AMPR family. Within this fam- 
ily, substrate specificity determinant information has 
been obtained only for the AMP-activated protein kinase, 
which also shows a requirement for an NH2-tenninal 
basic residue. The other major category of protein-serine 
kinases is the CMGC group. For the most part, these are 
proline-directed enzymes, phosphorylating substrates at 
sites lying in Pro-ridi environments. Available data for 
Cdc2 and Cdk2 indicate that members of the cydin-de- 
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pendent kinase £amily require phosphate acceptors lying 
immediately NHs-terminal to a Pro. A similar require- 
ment is indicated for the Eric (MAP) kinase femily. The 
situation for the GSKS £amily is more complicated, but 
most known acceptor sites lie within Pro-rich regions. 
The structures of Cdk2 and £rk2 indicate that the pocket 
for the -t-l residue is shallower than in PKA-Ca due to the 
replacement of Leu205 by an Arg, which b bulkier and 

Eredudes binding of the larger hydrophobic amino adds. 
1 addition, the unique secondary amide group of Pro 
may make spedal interactions (44). The casein-kinase 11 
femily enzymes fall to conform to the proline-directed 
spedfidty exhibited by the other msyor families of this 
group, showing instead a strong preference for Ser resi- 
dues located NHs-terminal to a duster of addic residues. 
The CMGC group protein kinases have larger-than-aver- 
age kinase domains due to insertions between subdo- 
mains X and XI, whose functional significance is 
unknown. 

The conventional protein-trosine kinase group in- 
dudes a large number of enzymes with quite dosely re- 
lated kinase domains that specifically phosphorylate on 

Sr residues (Le., they carmot phosphorylate Ser or Thr). 
ese enzymes, first recognized among retroviral onco- 
proteins, have been found only in metazoan ceUs where 
they are widely recognized for their roles in transdudng 
growth and differentiation signals. Induded in this group 
are more than a dozen distinct receptor families made up 
of membrane-spanning molecules tiiat share similar over- 
all structural topologies, and nine nonreceptor fanulies 
also composed of structurally similar molecules. The 
spedfidty determinants surroimding the Tyr phosphoac- 
ceptor sites have yet to be firmly established for these 
enzymes, but Glu residues either on the NH2- or COOH- 
terminal side of the acceptor are often preferred. This 
group is labeled "conventional" to distinguish it from 
other protein kinases (induding Spkl, Ok, die MEK/Ste7 
family members, Weel/Mikl, ActRE, Hrr25, Esk, and 
SplA/pFyV2) reported to exhibit a dual spedfidty, that 
is, being capable of phosphorylating both Tyr and 
Ser/Thr residues (45). However, in most cases dual sped- 
fidty has been observed only for autophosphorylation 
reactions in vitro, and the only dual spedfidty protein 
kinases that are known to be able to phosphorylate a 
substrate on Ser/Thr and Tyr are members of the MEK 
family. Considered as a group, these dual-specifidty pro- 
tein kinases are not particulariy dosely related to the 
conventional PTKs. Indeed, they seem to map through- 
out the phylogenetic tree (45), suggesting that the ability 
to autophosphorylate on Tyr may have had many inde- 
pendent origins during the evolutionary history of the 
super&mily. 

The protein kinases falling outside the four major 
groups are a mixed bag. Although the individual mem- 
bers within the defined £amilies foimd in this "other" 
category dearty are related to one another through both 
structure and function, it is difficult to make broader 
generalizations that could stoud anv of these families 
together into a larger category. As &r as substrate sped- 
fidty determinants go, littie is known about most "other" 
category protein kmases, due primarily to their rather 
recent discovery and the paud^ of known physiological 
substrates. The casein kinase I family members, however, 
have been shown to prefer Ser/Thr residues located 
COOH-terminal to a phosphoserine or phosphothreon- 
ine, although a stretch of addic residues may substitute. 



Also, the family of protein kinases involved in transla- 
tional control (HRI, PKS/Tik, Gcn2) appear to be basic 
amino add-directed enzymes preferring Ser residues ly- 
ing NH2- terminal to an Arg. Finally, as mentioned pre- 
viously, the MEK/Ste7 family protein kinases and 
Weel/Mikl protein kinases exhibit a dual spedfidty. 

Although this classification is based solely on catalytic 
domain sequences, members of £ajnilies defined by this 
means are usually dosely related in regions lying outside 
the cataytic. domains and in many cases have been shown 
to possess very similar functions. Thus, intercalation of 
newly discovered protein kinases into this classification 
should allow one to make useful predictions about the 
functions of such enzymes. 



FUTURE PROSPECTS 

The rate of protein kinase discovery still shows no signs 
of abating. In addition to the continuing successes of 
homology-based approaches, genomic sequencing pro- 
jects are beginning to make significant contributions. For 
instance, the sequences of two entire budding yeast chro- 
mosomes (46, 47) and a *2 Mb stretch of C eUgans chro- 
mosome ni (48) have revealed a number of new putative 
protein kinase genes. As genome sequencing projects 
gather speed, the number of new protein kinase genes 
discovered in this way will undoubtedly mushroom. This 
explosion of sequence data is making it increasingly diffi- 
cult to manage protein kinase databases of the sort de- 
scribed here. Programs designed to align and derive 
relatedness trees are currently unable to handle the large 
number of available kinase domain sequences. New data 
handling programs will have to be developed to cope with 
large numbers of sequences like those of the eukaryotic 
protein kinase superxamily. 

Protein kinase catalytic domain structures will continue 
to be solved. The first structure of a conventional pro- 
tein-tyrosine kinase will be available shortiy (see "Note 
added in proof), and this should reveal how Tyr is se- 
lected as an acceptor amino add vs. Ser/Thr. Such struc- 
tiu*es will enable comparative analysb to be carried out at 
the 3-dimensional level, and allow predictions of struc- 
tures from primary sequences. Structural comparisons of 
catalytic domains with bound peptide substrates will also 
provide insights into substrate spedfidty. Most protein 
idnases show some degree of primary sequence specific- 
ity, and new methods are being developed to determine 
consensus sequence specificities for individual protein ki- 
nases (44). With such consensus information the struc- 
tural basis for the binding of a preferred peptide 
sequence to the coenate substrate binding site can then 
be deduced. In the mture, it may be possible to model the 
S-dimensional structure of a novel protein kinase cata- 
lytic domain with sufficient accuracy to be able to deduce 
the preferred primary sequence surrounding the hy- 
droxyamino add it phosphorates, which in turn wUl 
allow one to predict what proteins might be its substrates 
from the increasingly complete database of protein se- 
quences. * Cfl 



Note added in proof: The crystal suucnire of the tyrosine kinase 
domain of the insulin receptor has now appeared (Hubbard, 
S. R, Wd. L., EllU, L., and Hendrickson, W. A (1994) Naiun S72, 
746-754). 
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The Protein Kinase Family: Conserved 
Features and Deduced Phylogeny 
of the Catalytic Domains 

Steven K. Hanks, Anne Marie Quinn, Tony Hunter 



In recent years, members of the protein kinase £miily have 
been discovered at an accelerated pace. Most were first 
described, not through the traditional biochemical ap- 
proadi of protein purification and enzyme assay, but as 
putative protein kinase amino acid sequences deduced 
fit>m the nucleotide sequences of molecularly cloned 
genes or complementary DNAs. Phylogenetic mapping of 
the conserved protein kinase catalytic domains can serve 
as a useful first stra in the functional characterization of 
these newly identified £unily members. 



THE PROTEIN KINASES ARE A LARGE FAMILY OF ENZYMES, 
many of which mediate the response of cukaryotic cells to 
external stimuli {1, 2). The number of unique members of 
the protein kinase family that have been described has recentty risen 
exponentially (J) and now approaches 100. The surge in the number 
of known protein kinases has been largely due to the advent of gene 
cloning and sequencing techniques. Amino add sequences deduced 
from nucleotide sequences are considered to represent protein 
kinases if they include certain key residues that are Idghly conserved 
in the protein kinase ''catalytic domain." 

Two diflfcrcnt molecular approaches have been most instrumental 
in the isolation of novel protein kinase-<ncoding genes or cDNAs: 
(i) complementation or suppression of genetic defects in inverte- 
brate regulatory mutants, and (ii) screening DNA libraries by using 
protein kinase genes as hybridization probes under low stringency 
conditions. Recently, an approach that uses degenerate oligonucleo- 
tides as probes has led to the identification of several novel putative 
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protein kinase genes and cDNAs {4, 5). The oligonucleotide probes 
arc designed to recognize target sequences that encode short amino 
add stretches highly conserved in protein kinase catalytic domains. 

In this article, we present an aligrunent of catalytic domain amino 
add scquerKes from 65 different members of the protein kinase 
family, including many putative protein kinase sequences recently 
deduced from nudeotide sequence data. Based on this alignment, 
wc first identify and discuss conserved features of the catalytic 
domains and then provide a visual display of the various interse- 
quence relations through construction of a catalytic domain phylo- 
genetic tree. Catalytic domains from protein kirwses having similar 
modes of regulation or substrate spcdfidries are found to duster 
together within the tree. This dustering would appear to be of 
predictive value in the determination of the pn^rtics and function 
of novel protein kinases. 



Catalytic Domain Amino Add Sequences 

Protein kinase catalytic domains range from 250 to 300 amino 
add residues, corresponding to about 30 kD. Fairly precise bound- 
aries for the catalytic domains have been defined through an analysis 
of conserved sequences {1, 6, see below) as well as by assay of 
truncated enzymes (7, 8). The location of the catalytic domain 
within the protein is not fixed but, in most single subunit enzymes it 
lies near the carboxyl terminus, the amiiK) terminus being devoted to 
a regulatory role. In protein kinases having a multiple subunit 
structure, subunit polypeptides consisting almost entirdy of catalyt- 
ic domain are common. AU protein kinases thus far characterized 
with r^ard to substrate specifidty M within one of two broad 
dasses, serine/threoninc-specific and tyrosine-spedfic. Althou^ 
both classes of protein kinase have very similar catalytic domain 
primary structures, certain short amino add stretches appear to 
diaracterize each class (4), and these regions can be used to prcdia 
whether a putative protein kinase will phosphorylate tyrosine or 
serine/threonine. 

Members of the protein-scrine/threonine kinase and protcin- 
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tyrosine kinase families with reported catalytic domain amino add 
sequences arc listed in Tables 1 and 2, respectively. They arc 
classified within the tables according to similarities in primary 
structure, based on deduced catalytic domain phylogehy. Included 
in the tables arc all confirmed and putative protein kinases for which 
the catalytic domain sequence was available as of November 1987 
(P). Prcsxuned functional homologs from difiercnt vertebrate species 
arc listed together, frcsumcd invertebrate functional homologs of 
protein kinases also found in vertebrates, however, are given 



separate listings as a reflection of greater evolutionary distance and 
the possibility of functional divergence. The asterisks indicate 
protein kinases that have catalytic domains that are included in the 
amino add sequence alignment. We will use the abbreviated names 
from the tables to refer to individual protein kinases. 

Of the 45 unique vertebrate protein kinase family members 
included in Tables 1 and 2, 22 are scrine/thrconinc-spcdfic and 23 
arc tyrosine-specific. Founecn of the vertebrate protcin-serine/dirc- 
onine kinases fall within one of the three subgroups that can be 



Table 1. Protcin-scrine/threonine kinase femily members. 



A. Cyclic nudcotidc^cpcndcnt subfamily D. 

cAPK-a: cAMP-dcpcndcnt protein kinase catalytic subunit, a fbmi 
^•bovine canliac musdc protein (26) 
-mouse S49 lymphoma cell cDNA (55) 
cAPK-p: cAMP-dcpcndcnt protdn kinase catalytic subunit, p form 
*-bovinc pituitary cDNA {36) 
-mouse S49 lymphoma cell cDNA (37) 
SRA3: cAMP-dcpcndcnt protein kinase from yeast, RAS suppressor 

*-Sauharomyca cermtM genomic DNA (38) 
TPK1(PK25): cAMP-dcpendent protein kinase from yeast, type 1 £. 

♦-S. ctrevisiae genomic DNA {39, 40) 
TPK2: cAMP-dependcnt protein kinase from yeast, type 2 

*-S, ccrevisiae genomic DNA {39) 
TPK3: cAMP-dcpcndcnt protein kinase from yeast, type 3 

carpisiae genomic DNA {39) 
cGPK: guanbsinc 3',5'-monophosphatc (cGMP)-dcpendcnt protein 
kinase 

*-bovinc lung protein {41) 

B. Calcium-phospholipid-depcndcnt subfamily 

PKC-a: protein kinase C, a form F. 
*-bovinc brain cDNA {42) 
-rabbit brain cDNA {43) 
-human brain cDNA (partial) {44) 
PKC-P: protein kinase C, p form 

♦ bovine brain cDNA {44) q 
-rat brain cDNA (two splice fbrms) {45, 46) 
-rabbit brain cDNA (two splice forms) {43) 
-human brain cDNA {44) 
PKC-'y: protein kinase C, -y fomi 
*-bovinc brain cDNA {44) 
-rat brain cDNA (45) 
-human brain cDNA {44) 
PKC-e: protein kinase C, € form 

-rat brain cDNA (RP16 done) (partial) {46) 
DPKC: DrosophUa gene product related to protein kinase C 
meiancfgttster cDNA {47) 

C. Calcium-calmodulin-dcpcndent subfamily 

CaMII-a: caldum-calmodulin-dcpendent protein kinase type II, a H. 

subunit 

*-rat brain cDNA {4^ 
CaMII-^: caldum-calmodulin-dcpendent protein kinase type II, p 

subuiiit 

*-rai brain cDNA {49) I. 
PhK-7: phosphorylase kinase, 7 subunit 

*-rabbit skeletal musdc protein and cDNA {50) 
-mouse musde cDNA {51) 
MLCK-K: myosin lig^t chain kinase, skeletal musde 

♦-rabbit skcleol musdc protein (52) 
MLCK-M: myosin light chain kinase, smooth musde 

♦-chicken gizzard cDNA {53) 
PSK-Hl: putative protein-serine kinase 

♦-human HcLa ccU cDNA {4, 54) 
PSK-C3: putative protein-serine kinase 

-human HcU cell cDNA (partial) {4) 



SNFl subfamily 
SNFl : "sucrose nonfermenting" mutant wild-type gene produa 

*-5. cerrvisiae genomic DNA (55) 
niml*: "new inducer of mitosis"; suppressor of cdc25 mutants 

^-Schkosaahtmmiyces pombe genomic DNA (56) 
KINl: putative yeast protein kinase 

*'Saccharomyca umnsiae genomic DNA (5) 
KIN2: putadvc yeast protein kinase related to KINl 
♦-5. cerevisiae genomic DNA (5) 
CDOS-cdci'^ subfiunily 
CDC28: "cell-division<yde'' gene produa in yeast 

♦-5. cerevisiae genomic DNA (57) 
cdc2'^: "ccll-divisiori<yde" gene produa in yeast 

*-5chizo5acchamnyces pombe genomic DNA {58) 
CDC2Hs: human fiincdonal homolog of cdc2* 

♦-human transformed cell line cDNA {33) 
PSK-J3: putative protein kinase related to CDC28<dc2* 

♦-human HcLa ccU cDNA {4, 59) 
KIN28: putadvc jwotein kinase related to CDC28-cdc2* 
*'Sa££haromyces cerevisiae genomic DNA {60) 
Casein kinase subfamily 
CKIIa: casein kinase 11, a subunit 

-bovine limg protein (partial) {61) 
DCKII: DrtKophiia casein kinase II, a subunit 
♦-D. melatu^aster cDNA (62) 
Raf-Mos proto-oncogene sub^unily 
Raf: cellular homolog of oncogene products from 361 1 murine 
sarcoma virus and Mill Hill 2 avian acute leukemia virus 
♦-human fetal liver cDNA (65) 
A-Raf: cellular oncogene produa closely related to Raf 
♦-human T ccU cDNA {64) 
-mouse spleen cDNA (65) 
PKS: cellular gene produa dosely related to Raf 

♦-human fetal liver cDNA (66) 
Mos: cellular homolog of oncogene produa from Moloney murine 
sarcoma virus 

♦-human placenta genomic DNA (67) 
-mouse NIH 3T3 ceU genomic DNA (68) 
-rat 3Y1 ceU genomic DNA (69) 
STE7 subfamily 
STE7: "sterile" mutant wild-type allele gene produa 

♦-5. cerevisiae genomic DNA {70) 
PBS2: polymixin B antibiotic resistance gene produa 
♦-5. cerevisiae genomic DNA {71 ) 
Family members with no dose rdanves 
CDC7: "cell-division-cyde" gene produa 
♦-5. cerevisiae genomic DNA (72) 
wcel*: "reduced size at division" mutant wild-type gene produa 

*-Sdnzi3sac£httrtmtyces pombe genomic DNA {73) 
rani"*": "meiotic bypass" mutant wild-type aUelc gene produa 

♦-S. /wmAf genomic DNA (74) 
PIM-I: putative transforming protein induced by murine leukemia 
virus integration 

♦ mouse BALB/c ccU genomic DNA (75) 
HSVK: herpes simplex virus-US3 gene produa 
♦-herpes simplex virus genomic DNA (76) 



*rrotein kinases that have catalytic domains indudcd in the amino add sequence alignment. 
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classified according to their mode of regulation: cyclic nudcotide- 
dcpcndcnt, caldum-phospholipid-dq)cndcnt, and caldum-calmo- 
dulin-dcpcndcnt. Two of the scrinc/thrconinc kinases, Mos and Raf 
(products of the c-mos and c-raf genes, respectively), are cellular 
homoiogs of transforming proteins encoded by the retroviral onco- 
genes. Other members of tfic serine/threonine group with demon- 
strated oncogenic potential arc A- Raf (a disdna Raf- related mem- 
ber), and PIM-1 (a putative transfonning protein activated by viral 
integration). Three vertebrate scrinc/thrconinc kinases (CDC2Hs, 
PSK-I3, and CKIIa) arc doscly related, by various degrees, to the 
yeast cell cydc control protein kinases CDC28 and cdc2^. A 
protein-scrinc/thrconinc kinase has been described in herpes sim- 
plex virus (HSVK) and, like the retroviral oncogenes, probably 
originated as a cukaryotic cellular sequence. The protcin-tyrosinc 
kinases can be further grouped as members of dthcr the Src 
subfamily or one of three different growth factor receptor subfami- 
lies. The protein-tyrosine kinases encoded by the c-abl and c-faljps 
genes may be considered distant members of the Src subfamily. At 
least nine of the protein-tyrosine kinase genes have been transduced 

Table 2. Protcin-tyrosinc kinase £unily members. 



by retroviruses where they encode transforming proteins. 

Twenty-five additional sequences listed in Tables 1 and 2 de- 
rive from invertebrate spcdes. Eight arc from DmophUa^ one 
from nematode, and the other 16 are from the budding or fission 
yeasts. Many of the Dmophila protein kinases, as well as the 
nematode protein kinase, were identified by screening DNA librar- 
ies with probes froth a vertebrate protein kinase gene or cDNA and 
thus are likely to represent functional homoiogs of the vertebrate 
enzymes. The Dmophila "sevcnless" (TIess) protein kinase and most 
of the yeast protein kinases were identified through molecular 
genetics. All of the yeast protein kinases identified to date £ill within 
die serine/thrconinc-spcdfic class, despite directed attempts to 
identify protein-tyrosine kinases in yeast (5). This observation, 
together with the &a that many of the protein-tyrosine kinase 
catalytic domains are components of growth factor receptor mole- 
cules, suggests that tyrosine spccifidty may have been a recent 
development in catalytic domain evolution, arising in conjunction 
with the acquisition of multicellularity and serving a role in cell-cell 
communication. 



A. Src sub£unily 

Src: cdliilar homok>g of oncogene produa from Rous avian sarcoma 
virus 

♦-human fetal liver genomic DNA (77) 
-moiisc brain cDNA; neuronal alternate splice form {78) 
-chicken genomic DNA (TP) 
•Xencpus Uuvis ovary cDNA (partial) {80) 
Yes: cellular homok)g of oncogene produa from Yamaguchi 73 avian 
sarcoma virus 

*-human embryo fibroblast cDNA {81) 
Fgr: cellular homolog of oncogene produa from Gardner- Rasheed 
feline sarcoma virus 

♦-human genomic DNA {82) 

-human B lymphocyte cdl line cDNA (amino terminus) {8$) 
FYN: putative protein-tyrosine kinase related to Fgr and Yes 

♦ human fibroblast cDNA {84) 

LYN: putative protcin-tyrosinc kinase related to LCK and Yes 

♦-human placenta cDNA (85) 
LCK: lymphoid cell protein-tyrosine kinase 

♦ human (JURKAT) T cdl leukemia line cDNA {86) 
-mouse (LSTRA) T cell lymphoma line cDNA {87) 

HCK: hematopoietic cell putative protcin-tyrosinc kinase 

♦-human placenta and penphcral leukocyte cDNAs {88) 

Dsrc64: DrosophUa gene produa related to Src; polytene locus 64B 
♦-D. mdam^aster genomic DNA {89, 90) 

Dsrc28: Dmophiia gene produa related to Src; polytene locus 28C 
♦-D. mcloftoffoster adult female cDNA (Pi) 

B. Abl subfiunily 

Abl: cellular homolog of oncogene produa from Abdson murine 
leukemia virus 

♦ human fetal liver cDNA (Pi) 

ARG: puunve protein-tyrosine kinase related to Abl 

-human genomic DNA (partial) {93) 
Dash: Dmophiia gene produa related to Abl 
♦-D. melanqgaster genomic DNA {90) 
NabI: nematode gene produa related to Abl 

^-Caenorhabditis elegam genomic DNA (94) 
Fes/Fps: cellular homolog of oncogene products from Gardner- 
Amstein and Snyder-Theilen fclmc sarcoma viruses and Fujinami 
and PRCII avian sarcoma viruses 
♦-human genomic DNA {95) 
-feline genomic DNA (96) 
-chicken genomic DNA (97) 

C. Epidermal growth &ctor receptor sub^utiily 

EGFR: epidermal growth factor receptor, cellular homok>g of 



E. 



oncogene produa (v-Erb-B) from AEV-H avian erythroblastosis 
virus 

♦ human placenta and A431 ceU line cDNAs (98) 
NEU: cellular cmcogene produa activated in induced rat 

neuroblastomas (also caUed ERB-B2 or HER2) 

♦-human placenta and gastric caiKer cdl line cDNAs (PP) 
-rat neuroblastoma cell line cDNA (100) 
DER: DrosophUa gene produa related to EGFR 
*'D, melancgaster genomic DNA (101) 

Insulin receptor sub£unily 
INS.R: insulin recepror 

♦-human placenta cDNA (102) 
IGFIR: insulin-like growth faoor 1 receptor 

♦-human placenta cDNA (103) 
DILR: Dmophila gene produa related to INS.R 

♦-D. melancgasur embryo cDNA (104) 
Ros: cellular homolog of oncogene produa from UR2 avian 
sarcoma virus 

♦-human placenta genomic DNA (105) 
-chicken genomic DNA (i06), chicken kidney cDNA (107) 
71ess: Dmophila sevenUa gene produa essential feir R7 
photoreceptor cell dcvcbpmcnt 

♦-D. ffuioftoffaster eye imaginal disc cDNA (208) 
TRK: colon carcinoma oncogene produa activated by generic 
recombination 

♦-human tumor cell cDNA {109) 
MET: ?/-methyl-?/'-mtro->r-nitrosoguanidinc (MNNG>-induced 
oncogene produa 

♦-human HOS ccU line cDNA (110) 
Platelet-derived growth ^urtor receptor subfemily 
PDGFR: platda-derived growth faaor receptor 

♦-mouse NR6 fibroblast ceU line cDNA (111) 
CSFIR: cokMiy-stimulating faaor-type 1 receptor; cellular homolog 
of oncogene produa (v-Fms) from McDonough feline sarcoma 
virus 

♦-human placenta cDNA (112) 
Kit: cellular homok>g of oncogene produa from Hardy- Zuckerman 4 
feline sarcoma virus 

♦-human placenta cDNA {U3) 
RET: cellular oncogene produa activated by rccombinarion 

♦ human T cell lymphoma cDNA (114) 
Other receptor-like protein-tyrosine kinases 

TKRU: putative protein-tyrosine kinase 

-chicken genomic DNA (partial) (US) 
TKR16: puative protein-tyrosine kinase 

-chicken genomic DNA (partial) (115) 



^Protein kinases that have catalytic domains included in the amino add sequence alignment. 
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Conserved Features of the Catalytic Domains 

To compare primary structures, wc have aligned catalytic domains 
from the 65 protein kinases marked by an asterisk in Tables 1 and 2 
(Fig. 1). The 65 sequences represent each of the separate entries in 
the Tables except for six family members that are not included 
because their catalytic domain sequences have been only partially 
determined. The alignment was made by eye and is parsimonious in 
nature; the amount of gapping introduced into the sequences in 
order to optimize positional similarities was kept to a minimum. 
The alignment clearly demonstrates the overall similarity among the 
catalytic domains. The catalytic domains arc not conserved uniform- 
ly but, rather, consist of alternating regions of higji and tow 
conscrvanon. Eleven major conserved subdomains arc evident (Fig. 
1, 1 to XI), separated by regions of lower conservation wherein foil 
the larger gaps or inserts. Very large inserts (in excess of 60 residues) 
occur in CDC7 between subdomains VII and VIII and between 
subdomains X and XI, and in PDGFR, CSFIR, and Kit between 
subdomains V and VI. A similarity prc^e of the aligned catalytic 
domains provides a ready visualization of the subdomain structure 
(Fig. 2). Such an arrangement of alternating regions of high and low 
conscrvadon is a common feature of homologous globular proteins 
(10) and gives some clues to higher order structure. The conserved 
subdomains must be important for catalytic function, either directly 
as components of the active site or indirccdy by contributing to the 
formation of the active site through constraints imposed on second- 
ary structure. The nonconservcd regions, on the other hand, arc 
likely to occur in loop structures, where folding allows the essential 
conserved regions to come together. 

Highly conserved individual amino acids within the catalytic 
domains arc cxpeaed to play important roles in catalysis. We will 
refer to amino add positions using the residue numbering for 
bovine adenosine 3',5 '-monophosphate (cAMP)-dcpcndent pro- 
tein kinase catalytic subunit, a form (cAPK-a, Fig. 1). Nine 
positions in the alignment contain the identical amino acid residue 
in each of the 65 sequences. These invariant residues correspond to 
cAPK-a: GIy^^ Lys^ Glu*', Asp'**, Asn''', Asp'**, Gly'^, Glu^**, 
and Arg^. An additional five positions contain the identical amino 
acid in all but one of the sequences: Gl/**, Val'^'', Phe**^, Asp^^, 
and Gly^. Many of these most highly conserved residues directly 
participate in adenosine triphosphate (ATP) binding and phospho- 
transfcr. 

The consensus Gly-X-GIy-X-X-Gly, found in many nucleotide 
binding proteins in addition to the protein kinases (7i), is found in 
subdomain I, very near the catalytic domain amino terminus. The 
invariant or nearly invariant residues corresponding to cAPK-a 
Gly^ and Gly^^ fall within this consensus. Only two positions on the 
amino-tcrminal side of this consensus show conservation through- 
out the protein kinase ^mily; hydrophobic residues occupy posi- 
tions one and seven upstream from the first glycine in the consensus. 
The amino terminus of some catalytic domain polypeptides lies as 
close as ten residues from the first conserved gjydne. A model for 
the ATP-binding site of v-Src (72), based on the three-dimensional 
structures from other nucleotide binding proteins, shows the Gly-X- 
Gly-X-X-Gly residues forming an elbow around the nucleotide, with 
the first glycine in contaa with the ribose nK>iety and the second 
glycine lying near the terminal pyrophosphate. A nearly invariant 
valine residue lies within subdomain I, located just two positions on 
the carboxyl-terminal side of the Gly-X-Gly-X-X-Gly consensus 
(Val" for cAPK-a) and may contribute to the positioning of the 
conserved glycines. 

In subdomain II lies an invariant lysine, corresponding to cAPK- 
a Lys'^, that is certainly the best charaacrized catalytic donuin 
residue. This lysine appears to be directly involved in the phospho- 



transfcr reaction, possibly mediating proton transfer (23). In cAPK- 
a (74), v-Src (J5), and EGFR (id), Lys^ or its equivalent reacts 
with the ATP analog />-fluorosulfonyl 5 '-benzoyl adenosine, thereby 
inhibiting enzyme activity. Site-directed mutagenesis techniques 
have been used to substitute alternate amino adds at this position in 
v-Src {13, 17), v-Mos {18), v-Fps {19), EGFR {20), INS.R {21), and 
PDGFR (22). All substitutions, including arginine, result in loss of 
protein kinase activity. In all but three of the aligned sequences, an 
alanine is present two positions on the amino-tcrminal side of the 
invariant lysine in subdomain 11. The invariant lysine lies 14 to 23 
residues downstream of the last conserved glycine in subdomain I, 
but no mutations have been made to test whether this ^>acing is 
critical. 

The central core of the catalytic domain, the r^on with greatest 
frequency of highly conserved residues, consists of subdomains VI 
through DC. The invariant or nearly invariant residues in subdomain 
VI (corresponding to Asp'^ and Asn'^') and subdomain VII 
(cwiesponding to Asp*^, Phe*", and Gly'^) also have been 
implicated in ATP binding. These residues arc part of a feature 
found in a number of baacrial phosphotransferases that use ATP as 
phosphate donor (23). The aspartic acid residues corresponding to 
cAPK-a Asp'^ and Asp*** may interaa with the phosphate groups 
of ATP through Mg^* salt bridges (23). The triplet corresponding 
to Asp***-Phc^-Gly'®* in subdomain VII is of further interest in 
that it represents the nrx>st highly conserved short stretch in the 
catalytic domains. It is flanked for two positions on cither side by 
hydrophobic or near-neutral residues. 

Subdomain VIII contains the consensus triplet Ala-Pro-Glu, a 
conserved feature often mentioned as a key protein kinase catalytic 
domain indicator (i). The invariant residue corresponding to 
cAPK-a Glu^ contributes to the Ala-Pro-Glu consensus. In addi- 
tion to the conservation of these residues, several other lines of 
evidence implicate this region as important in catalysis. Mutagenesis 
studies have shown that each residue in the Ala-Pro-Glu consensus is 
required for activity of v-Src {24), Other studies have provided 
evidence that this consensus lies very near the catalytic site. An 
affinity peptide substrate analog reacts with cAPK-a Cys'^, thereby 
inhibiting enzyme activity (25). Also, sites of autophosphorylation 
found in many protein-tyrosine kinases (i) as well as cAMP- 
dcpcndent protein kinase [Thr'^ (2d)] lie within 20 residues 
upstream of the Ala-Pro-GIu consensus. The role of this autophos- 
phorylation site is not entirely setdcd, but for several protein- 
tyrosine kinases there is evidence that phosphorylation of this site 
leads to increased catalytic activity (27). Autophosphorylation may 
result in a conformational change that allows better access of 
exogenous substrates to the active site. 

Subdomains VI and VIII are of additional interest in that they 
contain residues that arc specifically conserved in either the protciiv- 
serine/threonine or the protein-tyrosine kinases and, as such, may 
play a role in recognition of the correct hydroxyamino add. The 
most striking indicator of amino acid specificity is found in subdo- 
main VI, lying between the invariant residues corresponding to 
cAPK-a Asp*^ and Asn'^'; two of the residues implicated in ATP 
binding. The consensus Asp-Lcu-Lys-Pro-Glu-Asn in this region is 
a strong indicator of serine/threonine specificity, whereas the pro- 
tein-tyrosine kinase consensus is cither Asp-Lcu-Arg-Ala-Ab-Asn 
(for the vertebrate members of the Src subfamily) or Asp-Lcu-Ala- 
Ala-Arg-Asn (for all others). Another such region is found in 
subdomain VIII and lies immediately on the amino-tcrminal side of 
the Ala-Pro-Glu consensus. This region is highly conserved among 
the protein-tyrosine kinases with a more limited conservation 
among the protcin-scrinc/thrconinc kinases. The protein-tyrosine 
kinase consensus through this region is Pro-IlcA^al-Lys/Arg-Trp- 
Thr/Mct-Ala-Pro-Glu while the protcir>-scrine/threonine kinase 
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consensus is Gly-Thr/Scr-X-X-Tyr/Phc-X-Ala-Pro-Glu. These re- 
gions in subdomains VI and Vin that indicate substrate specificity 
have been targeted for the design of degenerate oligonucleodde 
probes for use in screening cDNA libraries to identify novel 
members of both the protcin-serinc/thrconinc (4) and pnotcin- 
tyrosiiK {28) kinase families. 

To date, no evidence has been reported concerning the possible 
funcdons of residues in conserved subdomains III, IV, V, DC, X, and 
XI. Subdomain IX contains a very well conserved short stretch that 
includes the nearly invariant residues corresponding to Asp^ and 
Gly^. Subdomains III and XI each contain an invariant residue, 
corresponding to Glu'' and Arg^. The latter or its equivalent must 
lie very near the catalytic domain carboxyl terminus. Arginine 
residues occupying this position reside just 16 residues upstream 
from both the CE>C28 and HSVK polypeptide carboxyl termini, 
and just 19 residues upstream from both the Mos arul Fes carboxyl 
termini. Deletion anatysis of v-Src places the carboxyl terminus of 
the catalytic domain of the protein-tyrosinc kinases at a conserved 
hydrophobic residue ten residues downstream of this arginine {8). 
The point mutation conferring temperature sensitivity in some aic28 
mutants replaces this conserved arginine with glutamine {29). 

A leap in our understanding of the functional roles of the 
conserved catalytic domain residues will come with the solution of a 
crystal structure for one of the protein kinase catalytic domains. The 
similarities in primary strucurc should carry over to the higher order 
structure and catalytic mechanism as well. Other investigators have 
been making progress toward the solution of the three-dimensional 
structure of cAPK-a {30), 



Catalytic Domain Phylogcny 

Amino acid sequence alignments can be used to deduce phylpge- 
netic relationships {31). We have used the alignment data fh>m Fig. 
1 to construct a phylogcnetic tree of the protein kinase catalytic 
domains (Fig. 3). All 65 of the sequences in the alignment are 
included in the tree. They derive from both vertclwate and inverte- 
brate soiirccs and, in some cases, presumed functional honK>logs 
fix^m both vertebrate and invenebrate sources are represented. The 
tree, therefore, reflects catalytic domain evolution stemming from 
gene duplication events (for example, when the vertebrate, mostly 
human, sequences are compared), spedation events (when verte- 
brate and invertebrate functional homolpgs are compared), or both. 

The tree reveals a relation between catalytic domain sequence and 
certain biochemical properties; catalytic domains from protein 
kinases having similar modes of regulation or substrate spedfidties 
tend also to have similar primary structures and duster together 
within the tree. Five major branch dusters arc present in the tree: (i) 
protein-tyrosine kinases, (ii) cyclic nudeotidc- and caldum-phos- 
phoiipid-depcndent protein kinases, (iii) caldum-calmoduiin-Kie- 
pendent protein kinases, (iv) protein kinases doscly related to 
SNFl, and (v) protein kinases dosely related to CDC28. These 
major dusters account for ail but 12 of the 65 sequences induded in 
the tree. Generally, a sequence foimd within one of these dusters 
shares in excess of 35% identical amino adds with each of the other 
sequences in the duster, whereas die catalytic domain sequences that 
do not map within the same duster have identities in the range of 
20 to 25%. 

The most highly populated duster contains aD 27 confirmed or 
putative protein-tyrosine kinases. The large number of protein- 
tyrosinc kinases probably reflects the intense research effort devoted 
to this group, rather than a true indication of their abundance 
relative to the protein-scrine/thrconine kinases. Branches leading to 
die Src subfamily and to each of the three receptor sub^unilies 



diverge from the main line at about the same point. In light of the 
oncogenic potential of many of the protein-tyrosine kinases, it b of 
interest that the protcin-serine/threonine kinases having the least 
divei^erKe fit)m this group indude Raf and Mos, cellular homologs 
of retroviral oncogene products. However, another potentially 
oncogenic protein-serino^threonine kinase, PIM-1, is not dosely 
rclatai to the protein-tyrosine kinases. 

The next most populous duster in the tree indudes two separate 
subfamilies that can be dassified according to their mode of 
regulation: the cyclic nudeotidc-depcndent protein kinases and the 
calcium-phospholipid-dependent protein kinases. The similarities in 
the mode of regulation of the members of these two subfamilies, 
namely, activation by "second messengers" rdeased in response to 
ligand binding at the cell surface, may be a reflection of thdr recent 
evolutionary divergence. 

The third major catalytic domain duster contains the subfemily of 
protein kinases diat have activities regulated by calnnxiulin. The 
calmodulin-dependent duster falls near the cycUc nudeotidc- and 
caldum-phospholipid-depcndcnt duster. All niembers of the cal- 
nfKxiulin-dcpcndent subfemily have a calnKxlulin birniing domain, 
characterized by a high proporticMi of basic amino add residues and 
having a propensity for formation of an amphiphilic a helix, residing 
outside die catalytic domain. (Note that the calmodulin binding 
domain sequences were not induded in the phylogcnetic analysis.) 
The different protein kinases thus fiir described as being regulated by 
caknodulin, diereforc, appear to have diverged fix>m a conwnon 
ancestor after acquisition of the calmodulin binding domain. The 
mapping of the putative protein kinase PSK-Hl within this duster 
predicts that this enzyme will also prove to be regulated by 
calmodulin. 

Also mapping near the cyclic nudeotidc- and caldum-phospho- 
lipid-dcpcndent protein kinases is a small duster composed of four 
protein kinases recently identified in the budding or fission yeasts; 
SNFl, niml^, KINl, and KIN2. Whedier diesc protein kinases 

Rg. 1 . Multiple amino add sequence alignment of 65 protein kinase catalytic 
domains. The first 38 sequences derive from protcin-scrinc/dirconinc 
kinases (indicated by asterisks in Tabic 1) and the remaining 27 sequences in 
the alignment arc from protein-tyrosinc kinases (indicated by asterisks in 
Table 2). cAPK-a and Src have been chosen as prototype protcin-scrinc/ 
threonine and protein-tyrosinc kinases, respectively; their catalytic domain 
sequences arc numbered to indicate residue position from the polypeptide 
amino terminus. (Although the human Src sequence is shown, tiic number- 
ing is actually taken from the chicken Src sequence to maintain established 
convention). The number of additional amino- and carboxyl-tcrminal flank- 
ing residues lying outside the catalytic domains arc shown at the beginning 
and end, respectively, of each sequence. In several cases the sequences have 
not been determined through to the polypeptide amino or carboxyl tcrmim; 
for these, the number of determined residues is given foUowcd by a plus ( + ) 
sign. An asterisk (♦) at the beginning or end of a sequence indicates that no 
additional flanking residues arc contained in the polypeptide. Gaps, repre- 
sented by dashes, were introduced into the sequences to optimize the 
alignment. In six cases, k>ng insert segments have been cxdudcd from the 
alignment to shorten the figure. The positions and lengths of the cxduded 
inserts within the alignment are indicated by numbers within braces (for 
example, {-48-}); the cxduded gap positions in the other sequences rfiat 
correspond to these long inserts are shown as double slashes (//)• Residues 
conserved in 62 or more of the 65 sequences are shown as white letters in 
black boxes. Positions where residues of similar strucnire are conserved in 63 
or moK sequences are shown in shaded boxes. Structurally similar groupings 
used for this purpose are nonpolar chain R groups (M, L, I, V, and C); 
aromatic or ring-containing R groups; (F, Y, W, and H); anall R groups 
with near ncumd polarity (A, G, S, T, and P); addic and uncharged polar R 
groups (D, E, N, and Q); and basic polar R groups (K, R, and H). The 
single-letter amino add code is used (A, alanine; C, cysteine; D, aspartic 
add; E, glutamic acid; F, phenylalanine; G, glycine; H, histidinc; I, 
isoleudnc; K, lysine; L, leucine; M, methionine; N, asparaginc; P, proline; 
Q, glutamine; R, arginine; S, serine; T, threonine; V, valine; W, tryptophan; 
and Y, tyrosine). Roman nunKrals at bottom indicate conserved subdo- 
mains. 
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Rg. 2. Simiiarity profile of protein kinase catalytic domains. For each 
position in the ali^uncnt shown in Fig. 1, a relative similarity score was 
detcmiincd based on the "structure-genetic" scoring nutrix {216) for amino 
add similarities. Similarity scores were calculated as the sum of all possible 
painvisc comparisons between the individual amino adds at each position 
and expressed as the percentage of the highest possible score (that is, the 
score obtained when an identical residue occupies the position in all 65 
aligned sequences). To smooth out the curve, a 9-position nmning average 
of the relative scores was determined, and every third position was plotted, 
f ositiohs that contain gaps for ten or more of the sequences were not 
indudcd in the profile; however, the locations of the major gap sites arc 
indicated by breaks in the curve. The mean relative score for all the positions 
indudcd in the profile is 66 with a standard deviation of 14.9. Relative 
similarity scores, obtained when the catalytic domain sequences were ran- 
domly scrambled had a mean of 47 and staiKlard deviation of 1.85. Roman 
humerak indicate cc^iserved subdomains. 
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Fig. 3. Deduced phybgeny of 
protein kinase catalytic do- 
mains. The phylogcnctic tree 
was constructed from the multi- 
ple alignment shown in Fig. 1. 
The trce-building concept of 
Fitch and Margoliash (117) was 

used as implemented by Feng and I>oolitdc {118). Briefly, similarity scores were obtained 
for all possible pairwisc comparisons and transformed into a difference matrix from which 
branch order and length were determined. Programs were run on a VAX- 785 computer 
equipped with 40 megabytes physical memory under virtual memory operating system 
(VMS). Systems limitations required that the branch lengths for the protein-scrinc/ 
threonine and protein-tyrosine kinases be calculated separately, and the tree shown is thus 
a composite of these two determinations. The position of the protein-tyrosine kinase 
cluster was determined by including two protein-tyrosine kinases (Src and EGFR) in the 
protcin-scrine/threoninc kinase tree construction. The individual sequences are indicated 
by the abbreviated names in Tables 1 and 2. The protein-tyrosine kinases arc not labeled 
in (A), but arc shown in the cluster enlargement in (B). The tree is shown "unrooted" in 
(A) as the branches are all riKasured relative to one another with no outside reference 
point. The scale bars represent a branch length corresponding to a relative difference score 
of 25. The tree depiacd is likely to underestimate distances between the least related 
members of the family, particulariy since the alignment used in its construction is 
parsimonious. 
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have similar modes of regulation remains tx> be dctcnnined. KINl 
and KINl were identified through screening a Saccharomyces umisi- 
at DNA library with probes designed to recognize sequences 
characteristic of protcin-tyrosine kinases and, as such, have been 
suggested to represent "structural mosaics" with some features of 
catalytic domain structure more indicative of the protein-tyrosine 
kinases than the protein-scrinc/threoninc kinases (5). The deduced 
phylogcny of KINl and KIN2, however, does not suggest a dose 
evolutionary relationship with protein-tyrosine kinases. In fact, the 
probe target used to identify KINl ^d KIN2 encodes the stretch of 
amino adds corresponding to cAPK-a Asp^-Gly^ in conserved 
subdomain IX, a region of high conservation in all of the catalytic 
domains regardless of substrate spedfidty. 

The subfemily related to CE>C28 indudes functional homok>gs 
fix>m three widely divergent spcdcs: CDC28 from the budding 
yeast 5. cenvmae^ cdcl"^ fix)m the fission yeast Schkosacdtaromyccs 
pcmbe^ and human CDC2Hs. Functional homology was demon- 
strated by heterologous complementation of conditional mutants 
defective in ceil cyde progression (32, 33), The other two sequences 
mapping within this duster are putative protein kinases identified in 
Saccharomyces ccnvisiae (KIN28) and human HeLa ceils (PSK-JS). 
The members of this duster are also distinguished by the small sizes 
of the catalytic domain-containing polypeptides, suggesting their 
activities may be regulated through assodarion with other polypep- 
tides in a holoerKyme complex. Indeed, support for this notion has 
been obtained for cdc2"*' {34), 



Perspectives 

The tremendous diversity of the protein kinase femily is just now 
beginning to be appredated. Most of the catalytic domain sequcrKes 
referenced in Tables 1 and 2 were reported within the past 2 years. 
With continued characterizations of regulatory mutants in inverte- 
brates, along with the recent dcvdopmcnt of new hybridization 
approaches for the identification of DNA dones that encode novel 
protein kinase catalytic domains, it is likdy that the rate of discovery 
will continue to accelerate throu^ the next several years. The 
difficult tasks will be to confirm protein kinase activities for the 
newly identified family members and to ducidare their functional 
roles. Clues to function may come through an analysis of catalytic 
domain primary structure and subsequent phylogenetic mapping. A 
catalytic domain that has only limited divergence frqm another, 
better charaaerized, member of the family can be expected to play a 
similar role in cellular physiology. Further dues arc likely to come 
ftom an inspection of amino add sequences lying outside the 
catalytic domain where residues involved in enzyme regulation may 
be found. 



REFERENCES AND NOTES 

1. T. Hunter and J. A. Cooper, in TTbe fnagfuw, P. D. Beyer and E. G. Kieh$» Eds. 
(Academic Press, Orlando, FL, 1986), voL 17, pp. 191-246. 

2. A M. Eddman, D. K. Blumenthal, E. G. Krcbs, Arniu. Rev. Biocbnm. 56, 567 
(1987). 

3. T. Hunter, Ctfl 50, 823 (1987). 

4. S. K. Hanks, Pnc. Nad. Acad, Sd. USA. 84, 388 (1987). 

5. D. E. Levin, C I. Hanmnond, R. O. Ralston, J. M. Bishop, ibid., p. 6035. 

6. W. C. Barker and M. O. Dayhoff, ibid. 79, 2836 (1982). 

7. A. D. Lcvinson, S. A. Courtneidgc, J. M. Bishop, ibid. 78, 1624 (1981); J. S. 
Brugge and D. Darrow,/. BioL Cham. 259, 4550 (1984); J. Y. }. Wang and D. 
Baltimore, iWdL 260, 64 (1985); R. J. Bold and D. J. Donoghuc, JM^rf. CeU. Biol. 
5, 3131 (1985); I. Sadowski and T. Pawson, Ona^au 1, 181 (1987). 

8. V. W. Wilkcrson, D. L. Bryant, J. T. Parsons,/. Vmrf. 55, 314 (1985); P. Yaduk 
and D. Shalloway, Md. Cdl. Bid. 6, 2807 (1986). 

9. We have attempted to include all catalytic domain sequence reports except those 
encoded by retroviruses. Reports of partial sequences are included only if a 
complete catalytic domain sequence is not available. Wc have chosen not to 
tnctudc retroviral MKOgene produa sequences since the catalytic domains from 

I JULY 19S8 



diis group arc eflFectivdy represented by their dosdy related cellular counterparts. 
A listing of references for retroviral protein kinase sequences can be found in 
MoUadarBioL^ (fTttmor Vimsts: RNA Tumor Vintsts [R. A. Weiss, N. Teich, H. 
Varmus, J. Cc&ik Eds. (Cold Spring Harbor Laboratory, CoU Spring Harbor, 
NY, cd. 2, 1985)]. Since the time these tables were compiled, complete sequaxes 
for six additional members of the protein kinase fimuly have been published: (i) 
EPH, a novel reccpcor-like protcin-tyrosine kinase (H. Hirai, V. Mam, K. 
Haglwara. J. Nishida, F. Takaku, Soma 238, 1717 (1987)]; (ii) TKL, a novel 
member of the Src sub&mily [K. Strebhardt, J. I- Mullins, C Bruck, H. 
Rubsamcn-Waigmann,/Vwr. Nad.Acad. Sd. USA. 84, 8778 (1987)]; (iii) ninaC 
protein, a i>n«^*iil» gene produa essential for norinal phoeorw^^ 
[C Montril and G. M. Rubin, Ceti 52, 757 (1988)); (W) »»iiA, a ceU cyde 
oontrt>l protein kinase from Aspergillus [S. A. Osmani, R- T. Pu, N. R. Morris, 
Ofl 53, 237 (1988)]. Further, mDrosopbiU, a foll-lengch sequence of cAMP- and 
a partial sequence of cGMP^iependcra protein kinase catalytic domain have been 
reported [J. L. Foster. G. C. Higgins, F. R. Jackson,/. BioL Cbem. 263, 1676 
(1988)] as have catalytic domain sequences from a number of vertebrate protem 
kinases previously reported from other vertebrate species (not dtcd); (v) byrl a 
suppressor of sporulation defects in Sdritcsaabanmfces pmibe [S. A. Nadin-Davb 
and A. Nasim, EMBO J. 7, 985 (1988)]; and (vi) GCN2. a protein kinase 
fyMnfial for translational derepression of GCtf4 mRNA in Saaharomyca carvistac 
[I. Roussou, G. Thireos, B. M. Haugc,^. Cefl. Biol. 8, 2132 (1988)]. 

10. C Chorfiia and A. M. Lesk, EMBO J. S, 823 (1986). 

11. R. K. Wierenga and W. G. J. Hoi, Natmr 302, 842 (1983). 

12. M. J. E. Sternberg and W. R. Tayte, FEBS Lot. 175, 387 (1984). 

13. M. P. Kamps and B. M. Sefton, Mol. Ceil. Biol. 6, 751 (1986). 

14. M. J. ZoUer, N. C. Ncbon, S. S. Taylor,/. Biol. Cbem. 256, 10837 (1981). 

15. M. P. Kamps, S. S. Taylor, B. M. Sefton, Natmr 310, 589 (1984). 

16. M. W. Russo, T. J. Lukas, S. Cohen, J. V. Staros,/. Bud. Cbtm. 260, 5205 

(1985) , 

17. M. A. Snyder, J. M. Bishop, J. P. McGrath, A. D. Levinson, Mo/. CeO. BioL 5, 
1772 (1985). 

18. M. Hanninkand D. J. Dooo^Mc^PrvcNMtl.AauL Sd. USA. 82, 7894 (1985). 

19. G. Weinmaster, M. J. ZoUer, T. Pawson, EAfBO/. 5, 69 (1986). 

20. W. S. Chen a al., Naturt 328, 820 ( 1987); A. M. Honc^gcr a ai., Cd/ 51, 199 
(1987). 

21. C K. Chou a al.J. Biol. Cbem. 262, 1842 (1987). 

22. L. T. Williams, personal communication. 

23. S. Brenner, Hatvn 329, 21 (1987). 

24. D. Bryant and J. T. Parsons,/. Kmrf. 45, 1211 (1983); Jtfrf. Off. Bid. 4, 862 
(1984). 

25. H. N. Bramson et id. J. Bid. Cbem. 257, 10575 (1982). 

26. S. Shoji a a/., Pnc. Natl. Acad. Sd. USA. 78, 848 (1981); S. Shoji, L. H. 
Ericsson, K. A. Walsh, E. H. Fischer, K. Tnani, Biocbemistfy 22, 3702 (1983). 

27. G. Weinmaster, M. J. ZoUer, M. Smith, E. Hinze, T. Pawson, Cdl 37, 559 
(1984); G. Weinmaster and T. Pawson,/. Bid. Cbem. 261, 328 (1986); T. E. 
Kmiecik and D. ShaUoway, Cdl 49, 65 (1987); H. Piwnica-Worms, K. B. 
Saunders, T. M. Roberts, A. E. Smitfx, S. H. Cheng, ibid., p. 75; R. Heirera and 
O. M. Rosen, /. Bid. Cbem. 261, 11980 (1986); L. Ellis ff «/., Cdf 45, 721 

(1986) . 

28. R. A. Lindberg, unpunished data. 

29. A. Ldrincz and S. L Reed, JW^rf. Cdl. Bid. 6, 4099 (1986). 

30. J. M. Sowadski, N. h. Xuong, D. Anderson, S. S. Taylor,/. Md. Bid. 182, 617 
(1984). 

31. R.F. Doolitde,in71briVitfaM,H,NcurathandR.L.HiIl,Eds. (AcademKP^ 
New York, cd. 3, 1979), vol. 4, pp. 1-118. 

32. D. Beach, B, Durkacz, P. Nurse, Koturt 300, 706 (1982); R. Booher and D. 
Beach, Md. Cdl Bid. 6, 3523 (1986). 

33. M. G. Lee and P. Nurse, ATonwr 327, 31 (1987). 

34. L. Brizueb, G. Draetta, D. Beach, EMBO}. 6, 3507 (1987). 

35. M. D. Uhler et «/., Proc, Nad. Acad. Sd, USA. 83, 1300 (1986). 

36. M. O. Showers and R. A. Maurer,/. Bid. Cbem. 261, 16288 (1986). 

37. M. D. Uhler, J. C Chrivia, G. S. McKnight, ibid., p. 15360. 

38. J. F. Cannon and K. Tatchell, Md. Cdl. Bid. 7, 2653 ( 1987). 

39. T. Toda, S. Cameron, P. Sass, M. ZoUer, M. Wigler, Cdl 50, 277 (1987). 

40. J. Liszicwicz, A. Gochny, H.-H. Forster, H. Kuntzd,/. Bid. Cbem. 262, 2549 

(1987) . 

41. K.Takiorttt/-,Bi»heiittfr>y 23, 4207 (1984). 

42. P. J. Parker a al., Sdenu 233, 853 (1986). 

43. S.Ohno«rfl/.,N«i«325, 161 (1987). 

44. L. Coussens et al., Sdena 233, 859 (1986). 

45. J. L. Knopf *t Cdl 46, 491 (1986). 

46. G. M. Housey, C A. CBrian, M. D. Johnson, P. Kirschmder, I. B. Wemstcm, 
Pne. Nad. Acad. Sd. USA. 84, 1065 (1987). 

47. A Rosenthal et al., EMBO J. 6, 433 ( 1987). 

48. IL M. Hanley a «^, Sdeme 237, 293 ( 1987); C R. Lin «r a^, Pne. Nad. Acad. 
Sd.U.SA. 84. 5962(1987). 

49. M-K.BennettandM.B.Kcnnedy,Proc.N«rf.>4flwiSo. aS-A. »4, 1794(1987). 

50. E. M. Rcimann et al., Biodxmistry 23, 4185 (1984); E. F. da Cruz e SiWa and P. 
T. W. Cohen, FEBS Lets. 220, 36 (1987). 

51. J.S.Chamberlain,P. VanTuinen,A.A.Re«ves,B. A.Philip,CT.Caskey,Pn»e. 

Nad. Acad. Sd. USA. 84, 2886 (1987). 

52. K. Takio et al., BiodKtmssrj 24, 6028 (1985). 

53. V.Guerriao,Jr.,M. A.Russo,N. J.OIson,J.A.Putkcy,A.R.Means,»Krf.25, 
8372 (1986). 

54. The entire catalytic domain scquerKC has been determined by J. R- Woodgea 
(unpi^>Ushcd data). 

55. J. L. Cdenza and M. Carbon, Sdenu 233, 1175 (1986). 

ARTICLES 51 



56. P. Rmsdl and P. Nurse, Cdl/ 49, 569 (1987). 

57. A. T. Ldrincz and S. I Rccd, Naturw 307, 183 (1984). 

58. J. Hindky and G. A. Phcar, Gau 31, 129 (1984). 

59. The cndiT catalytic domain sequence has been detennincd by S. K. Hanks 
(uiqniblishcd data). 

60. M. Simon, B. Sera{>hin, G. Faye, EMBO /. 5, 2697 (1986). 

61. K. Takio a Pnc Nad. Atad, So. USA. 84, 4«51 (1987). 

62. A. Saxcna, R. Padmanabha, C V. C Qovcr.AfW. CeQ. BioL 7, 3409 (1987). 

63. T. I. Bonner ct a!., HmUkAeub lUs. 14, 1009 (1986). 

64. T. W. Beck, M. Hulcihd,M GunncU, T. I. Bonner, U. R. Rapp, ibid. 15, 595 
(1987). 

65. M. Hukihd a d.,MoL. CM. m, 6, 2655 (1986). 

66. G. E Mark, T. W. Scdcy, T. B. Shows, J. D. Mountz, Vrxic Nasi, AauL So. 
USA. 83, 6312 (1986). 

67. R. Watson, M. Oskarsson, G. F. Vandc Woudc, ilnd. 79, 4078 (1982). 

68. C Van Bcvcicn a al., Naturt 289, 258 (1981). 

69. F. A. Van dcr Hoom and J. Ftrzlaff, NuddcAdds Ra. 12, 2147 (1984). 

70. M. A. Tcaguc a a/., Prvc. Nad, AauL Sd. USA. 83, 7371 (1986). 

71. G. Boguslawski and ). O. Polazzi, ibid. 84, 5848 (1987). 

72. M. Patterson, R. A. Sda&u, W. L. Fangman, J. Rosainond, Md. CdL Bid. 6, 
1590 (1986). 

73. P. RusseU and P. Nurse, Cdl 49, 559 (1987). 

74. M. McLcod and D. Beach, EMBO /. 5, 3665 (1986), 

75. G.Settcn^<afl46,603(1986). 

76. D. J. McGcocfa and A. ). Davison, Nmddc Adds Rxs. 14, 1765 (1986). 

77. S. K. Anderson, C P. Gibbs, A. Tanaka, H.-J. Kung, D. J. F^ita,3fa/. Cdl. Bid. 
5, 1 122 (1985); A. Tanaka a al., ibid, 7, 1978 (1987). 

78. R Martinez, B. Mathcy-Prevoc, A. Bernards, D. Bahimoce, Sdatu 237, 411 
(1987). 

79. T. Takcya and H. Hanafusa, CdJ 32, 881 (1983). 

80. R. ^.Stttk^NuddcAddfRis. 13, 1747 (1985). 

81. J. Sub^Wa rt al.^Md. CdL Bid. 7, 41 (1987). 

82. M. Ntshizawa «f ibid. 6, 511 (1986); R. C Parker, G. Mardon, R. V. Lebo, 
H. £. Varmus, J. M. Bishop, ibid. 5, 831 (1985). 

83. K. Inoue a «/., On^w 1, 301 (1987). 

84. K. Scmba aal.,Pnc. Nad. Acad, Sd. USA. 83, 5459 ( 1986); T. Kawakami, C 
Y. Penning;ton, K. C. Robtrins,^. Cdl. Bid. 6, 4195 (1986). 

85. Y. Yajnanashi ct oL, Md. Cdl. Bid. 7, Z$7 (1987). 

86. J. M. TicviUyan a oL, Biodtim. Biophp. Acta 888, 286 ( 1986) ; Y. Koga et al., Eur. 
J. Immund. 16, 1643 (1986). 

87. J. D. Marth, R. Pcct, E. G. Krebs, M. Ptrimuttcr, Cdi4Z, 393 (1985); A. F. 
Voronoira and B. M. Scfton, Naotrt 319, 682 ( 1986). 

88. N. QuinticU iir.,3ffl^. Cdl. Bid. 7, 2267 (1986); S. F. Zicglcr, J. D. Manh, D. 
B. Lewis, R. M. Pcrlmutter, ibidy p. 2376. 

89. M. A. Simon, B. Drees, T. Romberg, J. M. Bishop, Cdl 42, 831 (1985). 



90. F. M. Hoffinan, L, D. Fresco, H. Hofl&nan-Falk, B.-Z. Shik>, ibid 35, 393 
(1983). 

91. R. J. Grt^ocy, K. L. Kammenneyer, W. S. Vincent III, S. G. Wadsworth, Afo/. 
Cdl. Bid. 7,2119(1987). 

92. E, Shtivcbnan, B. Lifehitz, R. P. Gale, B. A. Roe, E. Canaani, Cdl 47. 277 
(1986). 

93. G. D. Kruh a al., Sdence 234, 1545 (1986). 

94. J. M. Goddarxl, J. J. Wciland, R. Capecchi, Proc Nad. Acad. Sd. USA. 83, 
2172 (1986). 

95. A. J. M. Roebfock a al., EMBO J. 4, 2897 (1985). 

96. A.J.M.Roebtoek,J.A.Schalkcn,COnnekink,H.P.J.Bk)eincrs,W.J.M.Van 
dc Ven,/. Vird. 61, 2009 (1987). 

97. C-C Huang, C. Hammond, J. M. Bishop,/. Md. Bid. 181, 175 (1985). 

98. A. Ullrich d al., Natun 309, 418 (1984). 

99. L. Coussens a aL, Sdtmu 230, 1132 (1985); T. Yamamoto ct oL, Nat»in 319, 
230 (1986). 

100. C I. Bargmann, M.-C. Hung, R. A. Weinberg, Nature 319, 226 (1986). 

101. E. Livnch, L. Glazer, D. Segal, J. Schkssir^, B.-Z. Shik>, Cdl 40, 599 (1985). 

102. A. Ullrich ct al., Nmtun 313, 756 (1985); Y. Ebina a al., Cdl 40, 747 (1985). 

103. A. Ullrich ct al., EMBO J. 5, 2503 (1986). 

104. Y. Nishida, M. Hata, Y. Nishizuka, W. J. Ruttcr, Y. Ebina, Biodrem. Biophp. Ra. 
Commtm. 141, 474(1986). 

105. H. Matsushime, L.-H. Wang, M. Shibuya, Md. Cdl. Bid. 6, 3000 (1986); C. 
Btrciuncier, D. Bimbaum, G. Waitches, O. Fasano, M. Wiglcr, ibid.^ p. 3109. 

106. W. S. Neckameycr, M. Shibuya, M.-T. Hsu, L.-H. Wang, ibid., p. 1478. 

107. S. B. PodeU and B. M. Sefion, Ona^ 2, 9 (1987). 

108. E. Hafen, IC Basfer, J,-E, Edstroem, G. M. Rubin, Sormr 236, 55 (1987). 

109. D. Martin-Zanca, S. H. Hughes, M, Barbadd, Natun 319, 743 (1986). 

1 10. M. Park «r at, Nad. Acad. Sd. USA . 84, 6379 ( 1987); A. M.-L. Chan ft al., 
Owqjwi* 1,229 (1987). 

111. Y.Yatdcn«r«/.,Nanfrv 323, 226 (1986). 

112. L, Coussens a al., ibid. 320, 277 (1986). 

1 13. Y, Yarden a al., EMBO J. 6, 3341 (1987). 

114. M. Takahashi and G, M. Cooper, Md. Cdl. Bid. 7, 1378 (1987). 

115. D. A, Foster, ). B. Levy, Q. Q, Daley, M. C Simon, H. Hanafiisa, ibid 6, 325 
(1986). 

116. D.-F. Feng, M. S. Johnson, R. F. Doolittfc,/. Md. Evd. 21, 112 (1985). 

117. W. M. Fitch and E. Margphash, Sdenu 155, 279 (1967). 

118. D.-F. Feng and R. F. Doolitdc, /. Md. Evd. 25, 351 (1987). 

119. We thank D.-F. Feng and R. F. Doolitde for much helpful advice concerning 
phylogenetic tree constriKtion, R. W. HoUcy for support, J. R. Woodgptt for 
maldng unpublished sequence data available; R. A. Urtdbcrg and L, T. Williams 
for alk>wing us to reference their unpublished work, K. Hyde for assistance in 
sequence daa entry, and L. Norris for help in preparation of Fig. 1. Supported by 
grant GM38793 firom the NTH (SKH). 



SCIENCE, VOL. 241 



Exhibit 9 



This material may be protected by Copyright law (Title 17 U.S. Code) 



Molecular and Cellular Biology, Nov. 1985, p. 3131-3138 Vol. 5, No. 11 

0270-7306/85/113131-08$02.00/0 

Copyright O 1985, American Society for Microbiology 



Biologically Active Mutants with Deletions in the w-mos Oncogene 

Assayed with Retroviral Vectors 

RICHARD J. BOLD and DANIEL J. DONOGHUE* 
Department of Chemistry, University of California at San Diego » La JoUa, California 92093 

Received 20 March 1985/Accepted 12 August 1985 

We have constructed retroviral expression vectors by manipulatioD of the Moloney murine leukemia virus 
genome such that an exogenous DNA sequence may be inserted and subsequently expressed when introduced 
into mammalian cells. A series of N-terminal deletions of the v-mos oncogene was constructed and assayed for 
biological activity with these retrovhiU expression vectors. The results of the del^ion analysis dononstrate that 
the region of p3T^ coding region upstream of the third methloiiine codon is dispensable with respect to 
transformation. However, deietion mutants of v^mos which allow initiation of translation at the fourth 
methi<mine codon have lost the biological activity of the parental v-mo5 gene. Furthermore, experiments were 
also carried out to define the C-termlnal limit of the active region of p37"^ by the construction of premature 
tennination mutants by the insertion of a termination oligonucleotide. Insertion of the oligonucleotide Just 69 
base pairs upstream from the wild-type termination site abolished the focus-forming ability of v-mos. Thus, we 
have shown the N-terminal limit of the active region of p37*" to be between the third and fourth methionines, 
while the C-terminal limit is within the last 23 amino adds of the protein. 



Moloney murine sarcoma virus (M-MSV) was originally 
isolated from a sarcoma which appeared after injection of 
Moloney murine leulcemia virus <M-MLV) into BALB/c 
mice (32). M-MSV arose by recombination between 
nondcfcctive M-MLV and normal mouse cellular DNA (4, 
13-15). The acquisition of the cellular DNA occurred con- 
comitantly with significant deletions of the parental M-MLV 
genome. Thus. M-MSV is able to transform fibroblasts but is 
defective for viral replication (1, 4, 35-37). 

The acquired mouse cellular DNA of M-MSV has been 
mapped to an uninterrupted sequence of approximately 
1,200 base pairs (bp) near the 3' terminus of the M-MSV 
genome (14, 23, 47) and is referred to as v-mos. The 
nucleotide sequence of \-mos reveals that the 1,125-bp open 
reading frame is an env-mos fusion such that the first five 
codons of \-mos are derived from the M-MLV env gene and 
the remainder are derived from c-mos (11, 45, 46). 

A 37,000-dalton protein was first identified as the v-mos 
gene product by in vitro transcription of M-MSV viral RNA 
(34). Subsequently, a 37,000-dalton protein was im- 
munoprecipitated from M-MSV-transformed cells with an 
antibody directed against a peptide corresponding to the 
predicted C terminus of the \-mos gene product (36, 37). 
This protein, referred to as p37'"''', is presumably responsi- 
ble for transformation by M-MSV. p37'"°^ exhibits limited 
regions of amino acid sequence homology with the catalytic 
subunit of cyclic AMP (cAMP)-dependent protein kinase (3). 
While other oncogenic proteins which show a similar homol- 
ogy with the cAMP-dependent protein kinase possess pro- 
tein kinase activity, such as p60"^, no enzymatic activity has 
been unequivocally demonstrated for p37'"^' (27). 

We have constructed a series of deletions in the N- 
terminal coding region of w-mos. In addition, premature 
termination mutants of the w-mos gene were constructed. 
These mutations define the region of the gene requisite for 
transformation. The mutants of v-mos were expressed in 
eucaryotic cells with retroviral expression vectors. The 
retroviral vectors described here represent deleted deriva- 
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tives of M-MLV constructed such that the inserted DNA 
fragment replaces the retroviral env gene. Our work allows 
us to define the limits of the v-mos gene which are required 
to encode a transforming gene product. Mutants of v-mos 
with N-terminal deletions which allow initiation of transla- 
tion at the third in-frame ATG of v-mos still retain biological 
activity, whereas mutants with more extensive N-terminal 
deletions are biologically inactive. Premature termination 
mutants have demonstrated that some portion of the C- 
terminal 69 nucleotides is necessary to maintain biological 
activity. 

MATERIALS AND METHODS 

Construction of retroviral expression vectors. Many of the 
details of the retroviral vectors are summarized in Fig. 1. The 
proviral clone of M-MLV, p836, was initially described by 
Hoffman et al. (22) and served as the parental plasmid for the 
vectors described in this work. We desired a unique Xhol 
restriction site in our vectors, which was achieved by the 
following steps. First, the A'/ioI-f/i/idlll fragment (nucleo- 
tides [nt] 1560 to 4894 in the M-MLV sequence, reference 39). 
which encompasses the gag-pol region of M-MLV, was 
deleted. This was accomplished by blunt-end ligation of the 
Xhol terminus to the //mdlll terminus after treatment with 
the Klenow fragment of DNA polymerase I. This protocol 
destroyed \hcXho\ site but restored the //mdlll site. Second, 
the XhoX site downstream of the 3' long terminal repeat (LTR) 
was removed by digestion with Xhol, followed by treatment 
with Klenow fragment, and religation. Third, the C/al site in 
the env gene (nt 7674) was converted to an Xhol site by the 
insertionof an JTAoI linker (CCTTCGAGG). It should be noted 
that there is also a second Clal site in M-MLV, at nt 4980. but 
this site is methylated in DNA grown in dam^ strains of 
Escherichia coli and is refractory to cleavage by Clal. 

To position the Xhol restriction site at the correct location 
in the vectors, the following sequence of steps was under- 
taken. First, the HmAlll-Hhal fragment of M-MLV (nt 4894 
to 5780 of M-MLV). which contains the 3' splice acceptor 
site of the env gene and also the initiator ATG for the env 
gene product, was subcloned as an //mdlll-EcoRI fragment 
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to yield the plasmid pDD90. Furthermore, the EcoRl site 
was converted to an Xhol site by linker insertion 
(CCTCGAGG) to yield the clone pDD92. Second, the 
subclone pDD90 was linearized at the EcoRl site, which 
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replaced the former ///tal site (at nt 5780), and treated with 
BAL 31 nuclease (28). After BAL 31 digestion, Xhol linkers 
(CCTCGAGG) were ligated onto the digested DNA. Third, 
several of the resulting plasmids were isolated, and the 
position of the Xhol linker was determined in each by 
nucleotide sequencing (30). One such clone, pDD94, had 29 
bp removed by BAL 31, including the env gene ATG. In this 
clone, the Xhol site was located at nt 5751 of M-MLV, 
whereas the env gene ATG was located at nt 5777. The final 
step in the construction of the vectors was to replace the 
Hindlll'Xhol fragment of the M-MLV derivative pDD99 
with the Hindlll'Xhol fragment of each of the subclones 
pDD94 and pDD92 described above. Insertion of the 
//i/idIII-;!r/ioI fragment of pDD94 gave rise to the vector 
pDD102, while insertion of the Hindlll-Xhol fragment of 
pDD92 gave rise to the vector pDD103. 

The vectors pDD102 and pDDlOB are similar in the 
following details, (i) The A'/ioI-Z/mdlll region of M-MLV (nt 
1560 to 4894) is deleted, (ii) Most of the env gene, from Hhal 
to Clal (nt 5780 to 7674), is deleted, (iii) The 5' splice donor 
site and the 3' splice acceptor site for the env gene are 
retained. The two vectors differ in that pDD102 lacks the env 
gene ATG codon, whereas pDD103 retains the env gene 
ATG codon. 

The clone pDD98 is similar to the clone pRB15, shown in 
Fig. 2, which contains the wild-type v-mos gene inserted into 
the vector pDD102. pDD98 differs from pRB15, however, by 
the retention of the gag-pol region of M-MLV, contained on 
the Xhol-Hin6lll fragment (nt 1560 to 4894 in the M-MLV 
sequence, reference 39), as shown in Fig. IB. 

Construction of deletion mutants in the mos gene. The 
v-mos gene which served as the starting material for the 
isolation of deletion mutants was the Xbal to Mindlll frag- 
ment of the 124-MSV strain contained in the plasmid pDDO 
(11). The long open reading frame of the w-mos gene which 
encodes p37'"''' is conveniently flanked by a unique Xbal site 
upstream and a unique Hindlll site downstream. The 
Hindlll restriction site downstream of the coding region was 
first converted to an Xhol site by insertion of an Xhol linker 
(CCTCGAGG). The resulting plasmid was then linearized at 
the Xbal site upstream of the coding region and treated with 
BAL 31 exonuclease (28) for varying lengths of time. Xhol 
linkers were then ligated to the DNA, yielding mos genes 
deleted for various distances and flanked by Xhol sites. The 
exact length of deletion was determined by Maxam-Gilbert 
sequencing (30) with 3 '-end-labeled restriction fragments. 
The endpoint of each deletion, as determined by sequencing, 
is indicated in Fig. 2. 

Construction of C-terminal premature termination mu- 
tants. The v-mos gene was linearized at one of three unique 
restriction sites, Kpnl. Hpal, or 5^/11, in the long open 
reading frame (see Fig, 3). The linear fragments were then 



FIG. 1. (A) Construction of retroviral expression vectors. 
Shown is the genome of M-MLV with the splice sites and the 
retroviral genes denoted. Restriction sites used in the construction 
of the expression vectors are also shown as well as the correspond- 
ing nucleotide positions (39). Details of the construction of pDD102 
and pDD103 are given in Materials and Methods. Each vector 
contains a unique Xho\ site for insertion of a DNA fragment in place 
of the retroviral env gene. In the vector pDD102, the env gene 
initiation codon has been removed, so a DNA insert must provide its 
own ATG codon for the initiation of translation. In the vector 
pDD103, the env gene initiation codon remains and will be fused to 



any DNA fragment inserted at the Xho\ site. The lower portion of the 
figure shows the entire plasmids with the splice sites, LTR sequences, 
and important restriction sites denoted. Both pDD102 and pDD103 
are approximately 15.8 kilobases in size. (B) Comparison of two 
vectors which express v-mos. The genome structures of pDD98 and 
pRB15 are compared with the parental retroviral genome of M-MLV. 
pDD98 retains the complete gag-pot region of M-MLV . whereas most 
of the ;pa^-po/ region has been deleted from pRB15. Given that pRBlS 
demonstrated greater biological activity than pDD98 (see text and 
Table 1). a vector with this same deletion (pDD102) was used for the 
assay of all v-mos deletion mutants. 
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treated with the Klenow fragment of DNA polymerase in the 
presence of all four deoxyribonucleoside triphosphates to 
yield blunt ends. The termination oligonucleotide 
(TCAATCAGTCAAGCTTGACTGATTGA) was then li- 
gated onto the DNA. This oligonucleotide is self- 
complementary and contains termination codons in all three 
reading frames. The presence of the oligonucleotide in the 
resulting plasmids was monitored by the acquisition of an 
Hindlll site contained within the oligonucleotide. 

Biological assay. The biological activity of the various 
mutants of the \-mos gene was assayed with the expression 
vector pDD102. The Xhol-XhoX fragment of each mutated 
gene was inserted into the unique Xho\ site of pDD102. 
Colony hybridization was used to ensure the presence of an 
insert, and restriction mapping was used to define the correct 
orientation of the insert (20), 



TABLE 1. Transforming activities of deletion mutants" 



XhoX 



Xhel 



TQAi 

"T; fbcu» 
s ! assay 



+ 
+ 
+ 
+ 



pRBl5 

PRB16 

pRB6 

pRB3 

pRBI 

pRB9 

pRB19 

pRB5 

pRBIS 

pRB8 



FIG. 2. N-terminal deletion analysis of the y-mos gene. The 
structure of the M-MLV-derived expression vector pDD102 is 
shown with the fiilMength wild-type y-mos gene inserted as an 
Xhol-Xhol fragment. The long open reading frame of y-mos has 
been expanded to show the five ATG codons in the 1125-bp coding 
sequence. The ATG codons are represented by solid circles, and the 
opai terminator codon is also shown. Nucleotide numbers associ- 
ated with the ATG codons and the terminator codon arc from the 
published sequence of 124-MSV (46). The nucleotide position at 
each deletion endpoint is indicated* as determined by Maxam- 
Gilbert sequencing; the numbers refer to the first remaining base 
pair of y-mos after the Xhol linker. The biological activity of each 
deletion mutant was assayed by its ability to induce foci when 
introduced into NIH 3T3 cells. A plus indicates a minimum value of 
10^ FFU/pmol of cloned DNA. A minus indicates a maximum value 
of 8.3 X 10» FFU/pmol. 



04 



as 



Plasmid 



First 
ATG 
remaining 



Site of 
termination 
linker 



Biological 
activity 
(FFU/pmol) 



Rescued 
vims titer 
(FFU/ml) 



pDD98 


1 




7.4 


X 


102 


1.7 


X 


10* 


pRB15 (wild type) 


1 




1.3 


X 


10« 


3.2 


X 


10^ 


pRBl6 


2 




1.2 


X 


10> 


6.0 


X 


10* 


pRB6 


2 




1.1 


X 


10* 


9.5 


X 


10* 


pRB3 


2 




1.0 


X 


lO' 


9.0 


X 


10* 


pRBl 


2 




1.0 


X 


10* 


9.7 


X 


10* 


pRB9 


3 




1.1 


X 


10* 


1.1 


X 


lO' 


pRB19 


3 




1.1 


X 


10* 


1.2 


X 


10^ 


pRB5 


4 




<6.3 


X 


10' 


<5.0 


X 


10° 


pRpiS 


4 




<8.3 


X 


10' 


<5.0 


X 


\{P 


pRB8 


4 




<6.0 


X 


10' 


<5.0 


X 


ltf> 


pRB27 




Kpnl 


9.7 


X 


10' 


4.0 


X 


10' 


pRB28 




Hpa\ 


1.3 


X 


10' 


<1.0 


X 


10' 


pRB26 




SstW 


4.1 


X 


10' 


<2.5 


X 


10' 



" The upper portion of the table indicates the biological activity of the 
A^-termina) deletion mutants. The table also denotes the first in-frame ATG 
codon available for initiating translation in each of the deletion mutants. The 
biologica] activity of each plasmid was assayed as described in Materials and 
Methods. Values shown are calculated as focus-forming units per ptcomole 
(FFU/pmol) of transfected DNA. The lower portion of the table shows the 
t>iological activity of the premature termination iputants. The restriction sites 
at which the termination linker was inserted are denoted. The virus titer of 
collected culture fluid from the traifsfection plates was determined as de- 
scribed ID Materials and Methods. The titers shown are calculated as 
focus-formins units per milliliter of cuhure fluid (FFU/ml) which was collected 
approximately 7 days after transfection. 



The general protocol used for the focus assay was that 
described by Graham and Van der Eb (19). A l-jtg portion of 
each DNA, together with 0.5 \xjg of DNA of the replication- 
competent clone of M-MLV, p836 (22), was transfected in 
the presence of 15 of sheared calf thymus DNA. Each 
sample was applied to a 5-cm plate of approximately 50% 
confluent NIH 3T3 cells as a calcium phosphate coprecipi- 
tate. After approximately 12 to 18 h, the plates were trypsin- 
ized, and the cells were distributed to four 10-cm plates. The 
focus assays were then scored 12 to 14 days after transfec- 
tion. Results of the focus assays are shown in Table 1. These 
results were also confirmed by collecting the culture fluid 
from the transfection plates and determining titers for this 
culture fluid on fresh monolayers of NIH 3T3 cells for 
focus-forming virus. The results of the transforming vims 
titers are shown tn Table 1 and were always consistent with 
the results of the primary transfection assays. 

RESULTS 

Construction of retroviral expression vectors. We con- 
structed retroviral vectprs which permit the expression in 
eucaryotic cells of an inserted gene (2, 8, 10, 17, 18, 21, 24, 
33, 43). The vectors are constructed in such a way that the 
inserted DNA fragment is substituted for the env gene of 
M-MLV (21, 39). Consequently, the mRNA of the inserted 
gene will be transcribed and spliced in a fashion identical to 
that of the M-MLV env gene. Furthermore, the vectors 
provide LTRs both 5' and 3' to the insert which are required 
for the enhancement of transforming activity of yf-nios as 
well as recovery of infectious virus (4, 44, 49). The two 
vectors described here, pDD102 and pDD103, contain 5' and 
3' LTRs, 5' and 3' splice sites for the env gene, similar 
deletions, and a unique Xhol site for insertion of a DNA 
fragment (Fig. 1). However, pDD102 and pDD103 differ by 
the absence or presence of the env gene ATG codon. 

In the vector pDD103, a unique Xhol site was insened 
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directly downstream of the env gene ATG codon so that an 
inserted gene need not provide its own ATG for initiation of 
translation when expressed. In contrast, pDD102 lacks the 
env gene initiation codon. Treatment with BAL 31 nuclease 
was used to remove the env gene ATG and a short region 
lipstream (Fig. 1). The expression of a gene inserted at the 
unique Xhol site of pDD102 requires that the inserted gene 
provide its own initiation codon for translation. 

The parental y-mos gene was inserted into the vector 
pDD102 to yield the plasmid pRB15. This piasmid was 
biologically active when transfected into NIH 3T3 cells and 
scored in a conventional focus assay (Table 1). When the 
transfection was carried out in the presence of DNA of a 
clone of infectious M-MLV, it was possible to demonstrate 
transmissible transforming virus in the cuhure fluid from the 
transfected cells. We also constructed another retroviral 
vector, pDD98, which contains the wild-type y-mos gene. 
This construct is similar to pRB15 but retains the complete 
gag'pol region from M-MLV. The genome structures of 
M-MLV, pRB15, and pDD98 are summarized in Fig. IB. 
pDD98 consistently yielded fewer foci (7.4 x lO^ foci per 
pmol) compared with pRB15 (1.3 x 10^ foci per pmol) (Table 
1). Also, the titer of recovered transforming virus was lower 
for pDD98 (1.7 x 10^ focus-forming units [FFU]/ml) com- 
pared with pRB15 (3.2 x lO^ FFU/ml). The low transforma- 
tion frequency of pDD98 was shown not to be due to the 
reported cytotoxicity of y-mos (37), as the transformation 
efficiency of the herpesvirus thymidine kinase gene was not 
lowered by cotransfection with pDI>98 (data not shown). 
Due to the higher biological activity of the clone pRB15 
compared with the clone pDD98, we decided to use the 
vector pDD102 for subsequent analysis of the y-mos deletion 
mutants described below. 

N-termirtal deletions of y-mos* To delimit the region of the 
y-mos gene necessary for neoplastic transformation, we 
constructed a series of deletions of the N-terminal region. 
The N-terminal region of y-mos was treated with BAL 31 
nuclease for varied amounts of time to yield deletions in this 
region. The position of the deletion endpoints varied from 
just beyond the first ATG (pRB16), a deletion of 18 bp, to 
just before the fourth ATG (pRB8), which has 510 bp of 
y-mos removed (Fig. 2). The deleted y-mos genes, after 
insertion into the expression vector pDD102, were intro- 
duced into NIH 3T3 cells by the calcium phosphate 
coprecipitation technique to test the effect of each deletion 
on the transforming activity (19). 

The biological activity Of these deletions, as determined 
by standard focus assay, is indicated by + or - in Fig. 2, and 
quantitation is presented in Table 1. Deletion mutants with 
transforming activities denoted as - did not possess any 
transforming activity above the level of detection. This was 
confirmed by assaying the culture fluids for the presence of 
transmissible focus-forming virus, as described in Table 1. 
Mutants with deletions of only the first ATG codon of the 
open reading frame, e.g., pRBl. as well as mutants with 
deletions of both the first and second ATG codons, e.g., 
pRB9, were still biologically active. However, deletions 
extending beyond the third ATG codon abolished biological 
activity. Therefore, y-mos deletion clones such as pRB9 and 
pRB19, which presumably initiate translation at the third 
ATG, produce polypeptides capable of transforming NIH 
3T3 cells. This is in contrast to the y-mos deletion clones, 
such as pRB8, which must initiate translation at the fourth 
ATG and are incapable of transformation. Thus, the region 
of y-mos upstream of the third ATG codon is unnecessary 
for neoplastic transformation activity. 



It must be noted that the transforming ability of the 
deletions is below the level of wild-type y-mos activity. As 
observed in Table 1, there is approximately a 10-fold reduc- 
tion in the transforming activity of the y-mos deletions below 
that of the wild-type gene. This might result from initiation at 
out-of-frame ATG codons which are upstream from the 
correct, in-frame ATG (11). However, this seems unlikely as 
all the transforming deletions exhibit the same reduction in 
transforming activity, whereas they do not all possess out- 
of-frame ATG ccJdons upstream of the first remaining in- 
frame initiation codon. Another possible reason for the 
decrease in activity involves protein structure. While the 
N-terminal region upstream of the third ATG may not be 
absolutely required for transforminjg activity, it may play a 
significant role in the stability or the correct tertiary struc- 
ture of the protein. The important point to note, though, is 
that the mutants with deletions up to but not including the 
third ATG codon still possess the ability to transform cells, 
albeit not as efficiently as the wild-type parent. 

A cytotoxic effect has been reported during acute infection 
of NIH 3T3 cells by 124-MSV (37). However, periodic 
examination of the transfected cells by phase-contrast mi- 
croscopy revealed no evidence of extensive cell death, as 
would be expected if any of these constructs resulted in 
cytotoxicity. 

Premature termination of y-mos by oligonucleotide inser- 
tion. We also constructed mutations of the y-mos gene 
sp>ecifying premature termination of translation. A self- 
complementary oligonucleotide (Fig. 3) was synthesized 
which contains opal termination codons in all three reading 
frames. This oligonucleotide was then inserted at three 
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FIG. 3. Pre mature termination of the \-mos gene by oligonucle- 
otide insertion. The full-length coding region of the v-mos gene is 
shown from the initiation codon (ATG) to the termination codon 
<TGA). The nucleotide positions of these codons and of the restric- 
tion sites indicated refer to the sequence of 124-MSV (46). The 
synthetic oligonucleotide as shown was inserted at each of three 
restriction sites so as to cause premature lerminalion. Upon inser- 
tion of this linker at any restriction site, regardless of reading frame, 
translation will be terminated due to the presence of termination 
codons in all three possible reading frames. The biological activity of 
the prematura termination mutants was assayed by their ability to 
induce focus formation in NIH 3T3 cells. All lerminalion mutants 
possessed no biological activity and had a maximum value of 9.7 x 
10' FFU/pmol. 
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unique restriction sites in v-mos (Fig. 3). These mutants 
were assayed for focus-forming activity in the vector 
pDD102, and all were biologically inactive. Thus, the C- 
terminal limit of the gene required for the transforming 
ability of y-mos is beyond the 55/11 site at the 352nd amino 
acid of the protein, since termination at this site destroys 
biological activity. This indicates that some portion of the 23 
amino acids of the C terminus is essentia! for y-mos to 
transform NIH 3T3 cells. 

DISCUSSION 

Retroviral expression vectors have become the method of 
choice for the introduction of foreign genetic material into 
eucaryotic cells (8, 10, 21, 24, 26, 31, 33, 38, 41, 43, 48). The 
major characteristics which can be used to difiFerentiate the 
various retroviral expression vectors are: (i) the retroviral 
gene which is replaced with the exogenous gene, (ii) the 
presence or absence of a selectable marker within the 
expression vector, and (iii) whether the expression vector 
provides a promoter for the inserted gene. 

Our expression vectors, derived by deletion of M-MLV, 
allow for the replacement of the viral env gene sequence by 
the desired foreign gene while maintaining both LTR se- 
quences, splicing signals, and packaging signals. The expres- 
sion vectors are replication defective and therefore must be 
accompanied by coinfection by a helper virus if a transmis- 
sible virus stock is desired (29). While the vectors we 
constructed do not have a selectable marker other than the 
gene insert, there exists a unique HindUl restriction site in 
the deleted gag-pol region for the insertion of such a 
selection marker. Two other groups have constructed retro- 
viral expression vectors which are similar to those described 
in this work and deserve special note. Cepko et al. (8) have 
constructed an expression vector derived from M-MLV such 
that a foreign gene may be inserted so as to replace the 
retroviral env gene. A special feature of this vector is the 
presence of the Tn5 transposon-derived neomycin resistance 
gene which replaces the gag-pol region of the parental 
retrovirus. Hwang and Gilboa (24) constructed M-MLV- 
derived vectors which replace the env gene of the retrovirus 
with the neomycin resistance gene derived from Tn5. Our 
vectors, however, contain deletions within the gag-pol re- 
gion, whereas the vectors of Hwang and Gilboa retain intact 
the gag-pol region. 

In this work, we described two very similar retroviral 
constructs, pDD98 and pRB15, which express the v-mos 
oncogene but which differ by the presence or absence of the 
gag-pol region. Hwang et al. (25) have previously reported 
the existence of two regions in the gag-pol region which are 
required for efficient env gene expression. However, when 
the v-mos oncogene is inserted into the deleted vector, 
expression is more efficient than when inserted in the 
nondeleted vector. This increased expression in the deleted 
vector is reflected both in higher numbers of foci per 
picomole of transfected DNA and higher titers of transmis- 
sible transforming virus which is recovered from transfected 
cells (Table 1). These results suggest that the presence of the 
gag-pol intron is not essential, when using the retroviral 
vectors described here, for efficient expression of genes 
substituted in place of the retroviral env gene. 

Through the use of retroviral expression vectors which we 
have constructed, we have been able to assay the biological 
activity of N-ierminal deletions of v-mos. The results of this 
work show that some region between the third and fourth 
initiation codons is absolutely required for induction of focus 
formation by v-mos. Thus, the remainder of the gene down- 



stream from the third ATG codon possesses all regions 
requisite for biological activity. This defines the N-terminal 
limit of the v-mos gene necessary for focus formation. 

Previous delimitation of the v-mos gene required for 
biological activity has relied on altered recombinational 
events of the mos gene (6. 13, 14, 16, 40, 42). Myeloprolifera- 
tive sarcoma virus is a member of the MSV family originat- 
ing upon serial transplantation of a tumor induced in a 
newborn mouse by uncloned MSV with M-MLV as a helper 
(42). In the myeloproliferative sarcoma virus genome, a 
frameshift mutation in the N-terminal coding region pro- 
duces a truncated gene product of 19 amino acids upon 
initiation of translation at the first ATG of the v-mos gene. 
However, in myeloproliferative sarcoma virus, initiation can 
begin at the second ATG of mos and continue through the 
proper C terminus, presumably generating the gene product 
responsible for transformation. This work with 
myeloproliferative sarcoma virus places the active region of 
the v-mos gene beyond the second ATG. 

Another altered recombinational event involving the mos 
gene was discovered by Canaani et al. when a rearranged 
c-mos gene irc-mos) was isolated from a mouse myeloma 
tumor (7. 9. 40). The rearrangement involved the deletion of 
263 nucleotides from the 5' coding sequence of c-mos and 
the substitution of this information with a new cellular DNA 
sequence. Thus the gene product from rc-mos has an N 
terminus of 28 amino acids donated by the substituted 
cellular sequence while the remainder of the protein is from 
the Q-mos gene. The point of recombination is at the 73rd 
amino acid of v-mos, which lies between the second and 
third ATG codons of the 124-MSV v-mos gene. This work 
demonstrates that the transforming potential of c-mos lies 
downstream of the 73rd amino acid of the full-length gene 
product (7. 9, 40). Our work is consistent with these eariier 
observations by placing the N-terminal limit of the biologi- 
cally active region of the v-mos gene product at the third 
ATG, which codes for the 97th amino acid of the v-mos 
protein. 

Using a termination oligonucleotide, we have shown that 
some portion of the 23 amino acids at the C terminus of 
p37'"*" is necessary for biological activity. The sites of 
recombination among different variants of M-MSV have 
provided the only previous delimitation of the C-terminal 
active region of v-tjios (13, 14). The ml-MSV strain has a 
small internal deletion at the C terminus which results in an 
altered amino acid sequence beyond the site of deletion (6). 
Figure 4 shows the point of divergence between ml-MSV 
and 124-MSV. However, of the C-terminal 23 amino acids 
which we have shown to contain a region necessary for 
transformation, 17 are conserved in ml-MSV. Since only 
those variants of MSV which still possess the transforming 
capabilities of v-mos will be isolated, the recombinational 
site at the C terminus can be used to delimit the region 
necessary for biological activity (6, 13. 14). Our work, as 
well as the absence of MSV variants with C-terminal 
recombinational sites upstream of these terminal 23 amino 
acids, demonstrates that some portion of this sequence is 
required for biological activity of v-mos. 

The v-mos gene product is one of several oncogenic 
proteins which show homology to the bovine cAMP- 
dependent protein kinase but possess no specific tyrosine 
kinase activity (3, 27). There are several specific regions 
within the v-mos gene product which show conservation of 
residues with the protein kinase (Fig. 4). Directly down- 
stream of the third ATG codon is a sequence of nine amino 
acids, of which six are conserved with the cAMP-dependent 
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moa 1 MAHSTPCSQTSUVPHHFSLVSHVTVPSEGVMPSPLSLCRYLPRELSPSVDSRSCSIPLV 

cat 1 GN 

pRB5 
* ^ 

m08 61 APRKAG KLFLGTTPPRAPGLPRRLAWPSIDWEQVCLMKRLGSGGFGSVY— KAT 

III III I I ( I I 1 1 I I 

! i I III I I M I I I I I 

cat 5 AAAKKGSEQESVKEFLAKAKEDFLKKWENPAQHTAHUXJFERIKTLGTGSFGRVMLVKHM 




t 65 ETGNHYAMKILDKQKVVKLKQIEHTLNEKRI—LQAVNFPFLVKLEFSFKDNSNLYMVME 




t 121 YVPGGEMFSH LRRIGRFSEPHARFYAAQIVLTFEYLHSLDLIYRDLKP 



QOs 253 ANILISEQDVCKISDFGCSQKLQDLRGRQASPPHIGGTYTHQAPEILKCEIATPKADIYS 

lilt III II I I M I 

I I I I III II lilt I 

cat 169 ENLLIDQQGYIQVTDFGFAKRVKGRTWTLC GTPEYUPEIILSKGYNKAVDWVA 

pRB26 

moa 295 FGITLWQMTTREVPYSGEPQYVQYAWAYNLRPSLAGAVFTASLTGKALQNIIQSCWEAR 
III t II I 

til I til 

cat 225 LGVLIYEMAAGYPPFFADQPIQIYEKIVS GKVRFPSHFSSDLKD 

m1-MSVA 

ffloa 535 GLQRPSAGLLQRDLKAFRGTLG 574 

I I 1 1 1 1 II 

I I 1 1 1 1 t I 

cat 267 LLHN LLQVDLTKRFCNLKDGVHDIKNHKWFATTDVIAIYQRKVEAPFIPKFKCPGD 



cat 325 TSNFDDYEEEEIRVSINEKCGKEFSEF 349 

FIG. 4. Location of v-moj mutants and relationship to homology with cAMP-dependent protein kinase. The amino acid sequence of pSl*^' 
is shown (in one-letter code) in alignment with the catalytic subunit of cAMP-dependent protein kinase (cat) (3). Asterisks above the mos 
sequence denote the five methionine residues within the protein. The cndpoint of deletion mutant pRB5 represents the smallest deletion which 
destroys biological activity. The site of termination linker insertion in pRB26 is also shown. This leads to truncation of 23 amino acids from 
the C terminus of p37"~' and the loss of biological activity. The point of divergence between 124-MSV and ml-MSV is also shown. The closely 
related strain ml-MSV encodes a gene product with a slightly altered C terminus, due to a shift in the reading frame near the C terminus. As 
a result, the gene pfx)duct of ml-MSV retains only the first 17 residues of the 23 C-terminal residues of the 124-MSV gene product. 



protein kinase (5). Figure 4 shows the N-terminal endpoint of 
the smallest deletion mutant, pRB5, which is incapable of 
transformation. This mutant initiates translation at the fourth 
in-frame ATG when expressed in the vector pDD102. 

It should be noted that a presumptive ATP-binding site in 
pjymoj ijgg within the region which is required for transfor- 
mation, as shown by the deletion analysis presented here. 
This presumptive ATP-binding site, which includes lysine 
residue 121 of pB?'""', occurs just downstream of the third 
methionine. A similar ATP-binding site occurs in the cata- 
lytic subunit of the cAMP-dependent protein kinase and also 
in p60*'^, and site-directed mutagenesis of lysine 121 of 



p^-jmos indicates that this residue is functionally important 
for transformation by v-mos (M. Hannink and Daniel J. 
Donoghue, manuscript submitted for publication). 

Figure 4 also shows the site at which the biologically 
inactive C-tcrminal mutant pRB26 terminates translation 
because of the termination linker insertion. Beyond the site 
of premature termination, another region of \-nios is highly 
conserved with the protein kinase in which five of six 
residues are identical (reference 3; Fig. 4). This sequence 
homology lies within the C-terminal region which is required 
for transformation by v-mos. Although the v-mos gene 
product function has not been identified, it appears to 
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require regions which have conserved amino acid sequences 
with cAMP-dependent protein kinase. 

The normally quiescent c-mos gene, when acquired and 
expressed by the M-MSV retrovirus, becomes acutely trans- 
forming (5, 7, 9, 16). Since the function of the \-mos gene 
product has yet to be determined, neither its original role as 
2i c-mos gene product nor its method of transformation when 
expressed by a retrovirus is known (12, 27, 35). Our work 
delineates the N- and C-terminal regions of the mos gene 
requisite for biological activity and consequently the se- 
quences which are indispensable for an active \-mos gene 
product. 
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Analysis of the biological and biochemical activities of pp60'*«>^""«'*-"^ proteins encoded by 12 carboxyl- 
terminal mutants showed that a wide family of alternate src carboxyl termini permit complete transforming 
and kinase activities, src proteins having carboxyl termini which are up to 10 amino acids longer than that of 
pp6ff "^ (17 amino acids longer than that of pp60'^^'''^ still permit transformation. Transformation-positive 
mutations preserve leucine-516, a residue which is highly conserved in protein-tyrosine Iclnase sequences; 
removal causes in vivo protein instability. Successive deletion mutants show that this residue is at the boundary 
of a region required for kinase activity. pp60"^ which is truncated just outside this point still transforms cells 
and binds both pp5D and pp90 cellular proteins. 



The \'src retroviral oncogene arose by transduction and 
modification of the c-src cellular gene (for a review, see 
reference 3). Unlike the viral product pp60''"" , the cellular 
product pp60*''^'"' does not induce transformation even when 
overexpressed in fibroblasts (23, 40, 50) although it can 
induce focus formation at high expression levels (24). This 
may be due to the reduced protein-tyrosine kinase activity of 
pp60*=-'" relative to pp60''''^' (7, 10, 22). The catalytic domain 
for this activity resides in the carboxyl half of the molecule 
which is strongly homologous to domains in the other 
sequenced protein-tyrosine kinases (for a review, see refer- 
ence 21). Two cellular phosphoproteins, pp90 (a major 
cytoplasmic heat shock protein [38]) and pp50, specifically 
interact with pp60''''" (4) much more extensively than with 
pp60'^-^" (22). 

Structurally, the viral and cellular proteins differ by the 
substitution of 12 different carboxyi-terminal amino acids in 
pp60''*"^' for the 19 carboxyl-terminal amino acids of pp60''"'* 
and, depending on the strain of v-.v/t, 8 to 15 isolated 
substitutions (11. 46, 59) scattered throughout the remainder 
of the proteins. In pp60'"'" , this carboxyl-terminal region 
contains the niajor site of in vivo tyrosine phosphorylation 
(6) which may play a role in negatively regulating pp60*-''" 
protein-tyrosine kinase activity (8). Most of the DNA se- 
quence encoding the \'src carboxyl terminus is found in the 
chicken genome about 900 base pairs (bp) downstream from 
the C'Src termination codon (57, 59). 

The functional significance of the existence of multiple 
mutations between y-src and c-src has not yet been resolved. 
Single point mutations in pp60*' enable it to transform 
chicken embryo cells (J. B. Levy, H. Iba, and H. Hanafusa, 
Proc. Natl. Acad. Sci. USA, in press), and chimeric genes 
encoding the amino region of pp60*'''" and carboxyl region of 
pp60"'^' {QN-src chimeras) or encoding the amino region of 
pp60'""* and the carboxyl region of pp60*^"" (v/c-^rc chime- 
ras) both induce foci in chicken or mouse cells (23. 50, 65). 
However, the transforming activities of these y/c-src genes 
are restricted in NIH 3T3 cells in which, in contrast with 
c/y-src chimeras, plasmid-mediated gene expression confers 
only weak anchorage-independent growth and tumorigenic- 
ity in syngeneic animals to the recipient cells (E. P. Reddy et 
al., manuscript in preparation). In addition, v/c-5r<- chimeras 
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containing Schmidt-Ruppin D (SR-D) strain v-jrc* and the 
c-src carboxyl-terminal regions have highly reduced focus- 
forming activities relative to unmodified SR-D v-src genes 
when tested in NIH 3T3 cells (49). 

We studied the carboxyl-terminal region using a series 
of deletion, substitution, and addition mutants of SR-D 
pp60^-'" . We report that a wide variety of SR-D pp60^-^" 
cartK)xyl-tenninal mutants can transform fibroblasts efficiently 
and completely. In addition, we show that the abrogation of 
transforming activity generated by previously studied pp60'''^" 
carboxyl-terminal mutants (41, 65) is due to replacement of 
leucine-516 and that removal of this residue, which is highly 
conserved among the protein-tyrosine kinases, results in 
pp60'" instability in vivo. 

MATERIALS AND METHODS 

Plasmid constructions. Plasmids were constructed by stan- 
dard recombinant DNA techniques (31). pjrc ll was a gift 
from G. Cooper and A. Zelenetz. 

pRS4 was derived from psrcll by a three-fragment ligation 
which resulted in the replacement of the 1,765-bp psrcll 
PvttU-EcoRl sequence with a 1,724-bp PviiU-EcoRl se- 
quence originally derived from simian virus 40 (SV40) (the 
DNA was physically purified from pSW3gpt [34]). The in- 
serted SV40 sequence encodes the 11 terminal amino acids 
of pRS4 (in a reading frame not used in SV40) and contains 
the SV40 eariy-region polyadenylation site (60). pRS4 has a 
unique HindlU site immediately upstream of the pRS4-5rr 
termination codon. 

Plasmids pRS4-9, pRS4-ll, pRS4-27, pRS4-18, pRS4-19, 
pRS4-14, and pRS4-28 were isolated from a shotgun ligation 
of pSV40RI Hindlll fragments into the unique HindlU site of 
pRS4. pSV40RI, generously provided by R. Frisque (13), is 
a clone of £cy?RI -linearized SV40 into the pBR322 EcoRl site 
with the SV40 late region and pBR322 ampicillin resistance 
gene in the same orientation. The inserted fragments (iden- 
tified in Fig. Ic) are equivalent to SV40 HindlU fragments 
except that fragment A (using conventional SV40 fragment 
notation [60]) contains 31 bp from pBR322 between its 
EcoRl and HindlU sites instead of the corresponding 69 bp 
from the SV40 HindlU A fragment. Fragments identified by 
tetters without primes are oriented such that their SV40 
coordinates increase from left to right. Fragments with 
primes (') are inserted in ihe opposite orientation. 
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pRS4-28 (not shown) has the SV40 Hindlll D, D, and C 
fragments inserted in that order into the pRS4 HindlU site. It 
encodes the same protein as pRS4-18, but its polyadenyla- 
tion site is 1,644 bp further downstream than the one in 
pRS4-18. pRS4-28 induced foci with the same efficiency as 
pRS4-18 (unpublished data), suggesting that it was unlikely 
that the smaller variations in polyadenylation site location in 
the other constructs had affected their focus-forming effi- 
ciencies. 

pRS7 was constructed by ligating the 804-bp HindWl-Hpal 
pBR322-JC virus chimeric fragment gel purified from 
pMadl-TC (the fragment contains pBR322 nucleotides 31 to 
0 [42, 55] and JC virus nucleotides 1724 to 951 [14J) between 
the HindlU and Hpal sites of pRS4-14, a plasmid identical to 
pRS4 (Fig. Ic) in the included region of DNA, pRS6 was 
constructed by a parallel insertion of the 546-bp pBR322 
HindUl-Scal fragment (pBR322 nucleotide positions 31 to 
3846 [42, 55]) at the same location. 

The constructions of pRS5, pRS9, pRS8, and pRSll 
resulted in replacement of the 30-bp PvuU-HindlU fragment 
of pRS4 (Fig. Ic) with gel-purified fragments from other 
sources. pRS4-14 (see above) was used to gel purify the 
pRS4-homologous fragment and contributed an additional 
6-bp pBR322 Clal-HindUl fragment adjacent to the pRS4 
HindlU site in the construction of pRSll. The inserted 
fragments were: pRS5, 188-bp SV40 Stul-Hindlll fragment 
(nucleotide postions 1234 to 1046 [60]); pRS9, 262-bp BK 
virus Mnll-Hindlll fragment (nucleotide positions 4385 to 
4638 [48]: this fragment was purified from pBKV-9), a 
generous gift of R. Frisque [Frisque et al., submitted for 
publication]); pRS8, 131-bp pBR322 Alul-HindlU fragment 
(nucleotide positions 31 to 162 (42, 55); pRSll, 102-bp 
lambda cI857 Alul-Toql fragment (nucleotide positions 
31055 to 31157 [16]). Recombinants were screened and 
identified by multiple restriction enzyme digests. 

Tissue culture and tissue culture assays. Transfection and 
growth of NIH 3T3 mouse cells (50) and procedures for 
isolating focus-selected and Eco-j?p/-coselected cell lines 
(24) have been previously described. Coselection with 
pSVlneo was as described for Eco-^;p/ coselection, except 
the normal medium was supplemented with 4(X) jxg of G-418 
(Geneticin; GIBCO Laboratories, Grand Island, N.Y.) per 
ml for isolation of colonies and 200 (tg/ml for maintenance of 
established cell lines (53). Polyclonal mass cell cultures were 
selected by growing cells from 100-mm tissue culture plates 
containing 50 to 70 colonies to confluence in the presence of 
400 Jig of G-418 per ml. 

Immunoprecipitations. Cells were metabolically labeled 
(see below) and lysed in 0.5 ml of RlPA buffer {1% Triton 
X-100, 1% sodium deoxycholate, 0.1% sodium dodecyl 
sulfate [SDS], 150 mM NaCK 20 mM NaH.PO*) supple- 
mented with 1 mM phenylmethylsulfonyl fluoride, 2 mM 
EDTA, 50 mM NaF, 0.2 mM Na3V04, and 100 KIU of 
aprotinin (Sigma Chemical Co., St. Louis, Mo.) per ml. This 
RIPA buffer is a good phosphatase inhibitor (15). EDTA, 



NaF, and Na3V04 were included to further inhibit protein 
kinases and phosphatases. Lysates were clarified at 25,000 x 
g for 30 min. Portions containing equal amounts of trichlo- 
roacetic acid (TCA)-insoluble radioactivity were adjusted to 
constant volume and preabsorbed twice with 50 yA of a 10% 
suspension of fixed protein A-containing Staphylococcus 
aureus bacteria for 5 min at OX. Proteins were im- 
munoprecipitated in an excess of antibody from preabsorbed 
lysates with 1 yA of monoclonal antibody 327 (29) for 45 min 
at OT. Immune complexes were collected on 30 \i\ of 10% 5. 
aureus suspension that had been precoated with 1 jxg of 
anti-mouse immunoglobulin G (heavy plus light chains) 
(Miles Laboratory. Inc., Naperville, III.) by 20 min of 
incubation at 0°C, washed once with high-salt buffer (1 M 
NaCI, 0.5% Triton X-100. 10 mM Tris hydrochloride [pH 
7.2]), and washed twice with RIPA buffer. The washed 
pellets were suspended in 25 |xl of electrophoresis sample 
buffer and analyzed by electrophoresis on 7.5% SDS- 
polyacrylamide gels with the Laemmli (28) buffer system. 
[•^-^S]methionine-labeled proteins were detected by fluo- 
rography of dried gels after treatment with En^Hance (New 
England Nuclear Corp.. Boston, Mass.). '-P-labeled pro- 
teins were detected by autoradiography of dried gels with Du 
Pont Cronex Lightning-Plus intensifying screens. 

Protein kinase specific activity assays. Cells were labeled 
for 40 h with [-^•''S]methionine as described below. Proteins 
were immunoprecipitated as described above except that the 
immune complexes bound to protein A-containing S. aureus 
bacteria were washed twice with 1 ml of RIPA buffer, twice 
with 1 ml of high-salt buffer, suspended in 1 ml of phosphor- 
ylation buffer (20 mM Tris hydrochloride [pH 7.2], 5 mM 
MnCI. 2 mM 2-mercaptoethanol), and split into two 450-^1 
portions. After low-speed centrifugation, the pellets were 
resuspended in 20-ftl volumes of phosphorylation buffer 
containing 0.1 mg of rabbit muscle enolase (Sigma) per ml. 
One portion from each sample was supplemented with 1 ^lM 
[^-■^-P]ATP (specific activity diluted to 750 Ci/mmol; New 
England Nuclear Corp.). Both portions were incubated at 
23°C for 15 min, washed, and analyzed by SDS-polyacryl- 
amide gel electrophoreses as described by Coussens et al. 
(10). [^%]methionine-labeled proteins in the samples which 
did not contain [7-^-P] ATP were detected by fluorography as 
described above. ^-P-labeled proteins were detected by 
autoradiography as described above except that two sheets 
of aluminum foil were placed between the dried gel and the 
X-ray film to block the ^''S radiation. Control experiments 
showed that over 95% of the film exposure with this arrange- 
ment was due to radiation emitted from -^-P. 

Metabolic labeling. Cells were plated at 10^ cells per 35- 
mm plate 24 h before labeling. For short-term ['^S]melhi- 
onine labeling, cells were incubated for 1 h in methionine- 
free minimal essential medium (GIBCO) plus 10% calf serum 
and then incubated in 0.33 ml of melhionine-free medium 
plus 10% calf serum and 300 jiCi of [^^S]methioninc (>1.000 
Ci/mmol; New England Nuclear Corp.) per ml for 1 h. Cells 



FIG, 1. r-src and v-src expression plasmids. All modified src genes are expressed with identical upstream regions containing Rous sarcoma 
virus long terminal repeats (RSV LTR), splice donor (SD) and splice acceptor (SA) sequences, and downstream sequences containing the 
SV40 early-region polyadenylation site (pA). p.srrll has the same upstream region but uses a downstream Rous sarcoma virus long terminal 
repeat for polyadenylation. (a) SR-D v-src expression plasmid pAr< U. (b) r-src expression plasmid pRS4. p.vrcll and pRS4 are both 7.8 
kilobases (kb) long. Large dots. Rous sarcoma virus long terminal repeat: small dots. pBR322 vector sequence: hatched, v-src coding 
sequence: solid, nine-amino acid coding sequence substitution, (c) Modified regions of r-src DNAs. Only the modified 3' regions are shown; 
all other plasmid regions are identical to those shown in panels a and b. The sources of the DNAs used in the recombinant constructions are 
indicated (also see Materials and Methods). SV40 HindUl fragments are identified by suffixes (primes denote fragments inserted in opposite 
orientation to the conventional SV40 map (60). Al, Ahi\: Eg. BfilU CI. C/«l: H3. Wmdlil: Hp. Hpa\: H2. Hpa\\\ Kp. Kpft\\ Mn. M/i/l: P2, 
P\u\\\ Ps, Pst\\ RI. EcoK\\ Sc. 5o/l, 
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TABLE 1. Transforming activities of ppW" Carboxyl-terminal 
mutants 



Plasinid 


Mutation 


Relative 
focus- 
forming 
activity 


Growth of 
coselected 
cxpresser 

cells in 
soft 

agarose* 


pjrrll 


\-src 


1.0 


+ + 


pRS9 




<0.0005 




COOH) 






pRS8 


Deletion (Leu-517 to 


0.2 


+ + 


COOH) 






pRSll 


Deletion (Pro-518 to COOH) 


0.3 


+ + 


pRS5 


Siibstitution (Leu-516 to 


<0.0005 


ND 


COOH) 






pRS4 


Substitution (Pro-518 to 


1.1 


+ + 


COOH) 






pRS4-9 


Addition (4 aa** to 


0.9 


ND 








pRS4-n 


Addition (4 aa to 


0.8 


ND 


pp60«^'") 






pRS4-27 


Addition (5 aa to ppM**^-^" » 


1.3 


ND 


pRS4-18 


Addition (12 aa to 


1.1 


ND 


pp60**^'") 






pRS7 


Addition (14 aa to 


1.0 


+ 


pp60«'^'") 






pRS6 


Addition (17 aa to 


0,3 


+ 


pp60*s*-) 






pRS4-19 


Addition (18 aa to 


0.002 


ppeo"*^^") 







" Ail values are averages from at least three experiments of focus-forming 
activities relative to psrcll. The average p5rcll focus-forming activity was 2.6 
X 10* foci per pmol. 

* Multiple cloned cell lines or mass-culture cells that had been cotransfected 
with each r-jrc plasmid and either neo or Eco-gpt plasmids were coselected, 
tested for pp60^""*' expression by immunoprecipitation (e.g., sec Fig. 4), and 
assayed for anchorage-independent colony formation in medium containing 
0.3% agarose. + + , Phenotype displayed in pjrcll panel of Fig. 3 (10 to 50% 
colony formation; average colony size after 14 days of 0.34 mm); -i-, 
intermediate colony formation (10 to 50% colony formation; average colony 
size of 0.17 mm); phenotype displayed in NIH 3T3 panel (0% colony 
formation) of Fig. 3. ND. Not done. 

< aa, Amino acids. 



for long-term [^^S]methionine labeling were incubated in 1.0 
ml of methionine-free minimal essential medium plus 5% 
complete Dulbecco modified Eagle medium (GIBCO), 10% 
calf serum, and 100 p-Ci of [^'S]methionine per ml for 14 to 40 
h. Cells were labeled with ^^P by incubation in 80% phos- 
phate-free minimal essential mediuno-20% normal Dulbecco 
modified Eagle medium plus 5% calf serum for 8 h followed 
by incubation in phosphate-free medium containing 5% 
serum and 0.5 mCi of "P (ICN Radiochemicals^ Inc., Irvine, 
Calif.) per ml for 12 to 14 h. 

Determination of relative protein syntliesis rates, equilib- 
rium levels, and stabiliUes. Relative pp60*'^ synthesis rates 
were determined from the ratios of radioactivities present in 
pp60*"" bands immunoprecipitated from cell lysates contain- 
ing equal amounts of TCA-precipitable radioactivity after a 
short (1 h) ["Slmethionine labeling pulse (see above and Fig. 
5). This period is long enough for the intracellular amino acid 
pool to come to equilibrium (45) but short compared with the 
half-lives of the pp60^-^''*^ proteins (previously estimated to be 
8 h [22, 47 , 68]). Relative pp60^"' equilibrium levels were 
determined from similar comparisons by using cells labeled 
for a long period (27 h) compared with the ppeo*'*" half-life. 



pf-src 

^ eg 

pV'Src 
r eq 



Extremely long labeling periods were avoided to reduce the 
possibility of metabolic perturbations from the reduced 
methionine concentrations required for labeling. All the cell 
cultures were plated and grown in parallel under identical 
conditions for the 48-h period before lysis except that 
[^^S]methionine was added at either 27 or 1 h before parallel 
culture lysis and immunoprecipitation. 

For steady-state conditions, the protein synthesis rate per 
cell (ks), equilibrium level per cell (Peq)» and turnover rate 
constant (M are related by Pcq = kJ^d^ [For growing cells 
this must be modified topeq = + M where g = In 2/cell 
doubling time (e.g., see reference 27), but since the pp60*"' 
half-life is much less than the cell doubling time (about 24 h) 
g « kd and can be ignored.] Although the absolute values 
of these constants were not determined, the relative turn- 
over rates of the ppeo"*^**'"*'™**"' *"" variants relative to that of 
pp60^ '" were determined from 

^= ^£1. 

Relative stabilities were defined as the inverses of relative 
turnover rates. 

RESULTS 

The specific pp60**'" carboxyl terminus is not required for 
transfonnation. Plasmid pjrcll expresses SR-D ppaO""'*"^ and 
efficiently transforms NIH 3T3 mouse cells (50; G. Cooper 
and A. Zelenetz, personal communication). To determine 
whether the specific amino acids near the carboxyl terminus 
of PP60'"*''* were required for focus-forming activity, psrcll 
was modified to encode a protein in which the nine terminal 
amino acids of pp60'"''^ were replaced by nine unrelated 
amino acids. Transfection of this plasmid. pRS4 (Fig. 1 and 
2), into NIH 3T3 cells induced wild-type focus formation 
(Table 1). Foci and cells transformed by pRS4 were similar 
in morphology to those transformed by psrcll and displayed 
wild-type growth in soft agarose and in vivo tumorigenicity 
in adult and newborn NFS mice (Table 2). In addition, pRS4 

TABLE 2. In vivo tumorigenicity of pp60"^ carboxyl-terminal 
mutants" 

No. of tumors/ 

Cell tine animals 

injected 

NIH3T3 0/9 

C57(p5rcll) 2/4 

NlH(pRS4).G 3{7 

NIH(pRS4).H 6/8 

NlH(pRS4).1 6^ 

C57(pRS4) 2/4 

NIH(pRS8).K 5/5 

NlH(pRS8).L 5/5 

NIH(pRSll).K 2/7,4A7 

NIH(pRSU).L 

- Tumorigenicity was tested by iiueciing 1 x 10* NIH 3T3 cells or 5 x 10* 
C57BI-/6J cells subcutaneously into newborn NFS mice or adull C57BUW 
mice, respectively. All tumors appeared within 2 to 3 weeks wiUi the 
exception of one C57(pRS4) tumor that appeared within 30 days. No tumors 
appeared in NFS newborn mice iiyccted with normal NIH 3T3 cells or 
C57BU6J adult mice iiuected with normal C57BL/fiJ cells within the obser- 
vation period of 2 months for NIH 3T3 mice or 30 days for C57BL/6J mice. 
Two independent experiments were performed with the NIH(pRSll).K cell line. 
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FIG. 3. Biological comparison of pp60^'* curboxyl muiani expresser cell lines. M. Foci formed in monolayer culture after 14 days by NIH 
3T3 cells transfecied with plasmids p.vrcll. pRSll, and pRS8. No foci were observed in cells transfected with pRS9. SA. Growth in medium 
containing 0.39^ soft agarose 12 days after suspension of G-418-resis!ant mass-culture cells cotransfected with pSW2neo and the indicated 
plasmids. 



'iU 1!!^ <M fA 



X 



(» a cc 
a 91 a 



CO to 



tc 
a 



CO 



^ ^ ^ 



tc 
a 



X 

z 



X X 



a: 
a 



X 

z 



i 

a 



X X 



N 

V) 

i 



X 

z 



205- 




..: ; . '.^.'«MiMi - ■ ir^>?*«' -^ISI, "'^^^ 

■wf|^^::^lfe r^- ~ ^ r" 7^ n iS^^-S 



^f^^S ^^^^ ■ f''"'^ ttlKtf i^Mi^ (i^*?^ .flMite '■■■^K^^.' ^^^W ^^^^F ^^^^^ t ■ 



FIG. 4, Immunoprecipitation of pp6a*'^' from coselccted expresser cell lines. NIH 3T3 cells were cotransfected with x-sn- expression and 
coselectable marker gene plasmids. Coselected lines were labeled for 1 h with |''S]methionine. and cell lysates coniainmg equal amounts of 
TCA-insoluble radioaciiviiy were immunoprecipitated with anti-.vrc monoclonal antibody 327 (29) as described m Matenals and Methods. 
Names within parentheses identify the transfected r-.vrr plasmids. All cell lines were coselected with pSV302. an ^J^o-m (34) expression 
plasmid. except NIH(pRS9).K. which was coselected with p%WZneo (53). NIHtpArcll) is a focus-selected cell line, and NIH(w'r) is a selected 
cell line which was transfected only with pSV302, a-m is NIH(p.vr< n) lysate that was immunoprecipitated with only rabbit anti-mouse 
immunoglobulin G antibody and no monoclonal antibody. M.W.. Molecular weight standards ( x 10'). Expt>sure time = 4 days. 
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FIG. 5. Comparison of ppW"*^ synthesis rates and equilibrium levels. Normal and mutated pp60"^ proteins were immunoprecipitated 
with monoclonal antibody 327 from G-418-resistant mass-culture cells cotransfected with pS\2n€o and the indicated plasmids. Im- 
munoprecipitations were performed with cell lysates containing equal amounts of TCA-insoluble radioactivity prepared from cells labeled 
with ["Sjmethioninc for 1 h (S) or 27 h (L) as described in Materials and Methods. NIH(p5rcll) is a focus-selected cell line not containing 
pSVlneo, while NIH(pSV2«fo).MC is a mass-culture line transfected only with pSVlneo, M.W., Molecular weight standards (xlO^). 
Exposure time = 5 days. 



was cotransfected with pSV302 (an Eco-gpt expression 
plasmid) Into C57BL/6J 3T3 cells, and cells selected for 
Eco-gpt expression (34) were found to be tumorigenic in 
syngeneic mice (Table 2). Purified pRS4 DNA was 
tumorigenic uix>n direct injection into chicken wing webs (D. 
Robinson and H.-J. Kung, personal communication). The 
specific kinase activity of pp6(H^^"^'^'^ was similar to that of 
pp60''""^ for both immunoglobulin G heavy chain and 
enolase phosphorylation (data not shown). TTie 3' coding 
sequence of pRS4 was verified by sequencing (T. Kmiecik, 
unpublished data). 

Limits on pp60'""^ carboxyl termini which permit transfor- 
mation. Plasmids encoding 11 additional carboxyl-terminal 
mutants (Fig. 2) were constructed by replacing the 3' end of 
the y-src coding sequence with gel-purified DNA restriction 
fragments from numerous sources (Fig. 1) (Materials and 
Methods). Fragments encoding the desired 3' sequences and 
having appropriate restriction cleavage sites were located by 
computer search of sequence and plasmid data bases (51). 
These plasmids all contain the same upstream and down- 
stream control sequences (Rous sarcoma virus long terminal 
repeat and SV40 early polyadenylation signal) as pRS4. 

Three classes of recombi nant- jrc {r-src) coding sequences 
were constructed: (i) elongation mutants in which additional 
sequence was progressively appended to the pRS4 coding 
sequence (pRS4-9. pRS4-ll, pRS4-27, pRS4-18, pRS7, 
pRS6, pRS4-19). (ii) substitution mutants having the same 



length as v-jrc (pRS4 and pRS5), and (iii) deletion mutants in 
which the y-src coding sequence was progressively trun- 
cated at the 3' end (pRSll» pRS8. and pRS9). Plasmid 
focus-forming activities were determined in multiple trans- 
fection experiments in NIH 3T3 cells (Table 1). 

Progressive extension of the r-src coding sequence (Fig. 2) 
did not reduce focus-forming activity until the expressed 
protein was 17 amino acids longer than normal ppW*"^ 
(pRS6) at which point a threefold reduction in efficiency was 
observed. Further extension (pRS4-19) almost completely 
eliminated focus-forming activity (Table 1). 

Replacement of the last 11 (pRS5) rather than the last 9 
(pRS4) amino acids of ppeO^'^'*" with unrelated residues 
eliminated focus-forming activity, suggesting that residue 
Leu-516 or Leu-517 or both were required for transforma- 
tion. Progressive 3' truncation of the y-src coding sequence 
reduced focus-forming activity* but relatively high activity 
was observed as long as Ieucine-516 was preserved (pRSll 
and pRS8) (Table 1). Truncation of leucine-516 and down- 
stream residues completely eliminated focus-forming activ- 
ity (pRS9). Foci induced by pRS8 and pRSll were usually 
less pronounced than those induced by psrcll (Fig. 3). 

To verify that even those r-src plasmids which did not 
cause focus formation were inducing synthesis of the 
pp60'^'*''*^ proteins, the r-src plasmids were cotransfected in a 
10:1 molar ratio with plasmids expressing the neo or Eco-gpt 
genes, and multiple coselected cell lines were isolated with 
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appropriate selective media (35, 53). Four or more cell lines 
were isolated for each src expression plasmid tested except 
for pRS5 and pRS7 for which only two cell lines each were 
isolated; expression plasmids pRS4-9, pRS4-ll, pRS4-27, 
and pRS4-18 were not tested in this manner. The src proteins 
were immunoprecipitated from lysates of [^-^S]methionine- 
labeled cells by using monoclonal antibody 327 (29); we 
expected this antibody to bind the r-src proteins indepen- 
dently of their carboxyl termini since it efficiently precipi- 
tates both ppeO""'" and pp60''-*" (30). These experiments 
showed that pp60'^ '" was being efficiently synthesized in the 
different cell lines (Fig. 4). 

The coselecled T-src expresser cell lines were tested for 
colony formation in medium containing soft agarose (Table 
1). Anchorage-independent growth was completely corre- 
lated with focus formation except for cells expressing the 
longest T-src protein (pp60*^S'*- *'-*"). Although pRS4-19 did 
not induce foci, two of four coselected NIH(pRS4-19) lines 
formed colonies in soft agarose. 

Requirement of leucine-516 for src protein stability. Mass 
cultures of cells expressing the truncated src proteins were 
selected with G-418 after cotransfection of pRSll, pRS8, 
and pRS9 with pSVlneo. Mass cultures selected after co- 
transfection with psrcll and pSWlneo and after pSVlneo 
transfectidn alone were used for positive and negative con- 
trols. Each culture contained cells from about 50 indepen- 
dent G-4i8-resistant colonies. The growth in soft agarose of 
these cultures is compared in Fig. 3. In contrast with the 
reduced focus-forming efficiency (Table 1) and induced focus 
size (Fig. 3) of pRSU and pRS8 relative to psrcll. they 
induced comparable anchorage-independent growth. No col- 
onies were observed in multiple experiments with the pRS9- 
coiransfected mass culture cells. 

pp60^'" protein synthesis and equilibrium levels were 
compared (Fig. 5) by immunoprecipitation from cells labeled 
for periods which were short (1 h) or long (27 h) relative to 
the apparent pp60^-'" half-life (about 8 h for SR strains of 
\-src [22. 47, 68]). All the r-.vrc- proteins were synthesized at 
rates comparable to or higher than that of pp60''"*"\ but the 
equilibrium level of pp60**^'"'" was very low. Quantitative 
comparison showed that pp60**^' '" is at least five times less 



FIG. 6. Specific kinase activities of T-src proteins. The specific 
activities of ppW" and pp60' '" proteins for autophosphorylation 
and enolase phosphorylation were determined with protein bound 
with monoclonal antibody 327 (29). Cells were labeled with 100 jiCi 
of [^'Slmethionine per ml for 40 h, and lysates containing 250 }ig of 
TCA-precipitable total cell protein were immunoprecipitated as 
described in Materials and Methods. The 5. (wretts-bound immune 
complexes were suspended in phosphorylation buffer and split into 
two portions. Both aliquots were processed identically in the kinase 
assay except that h-^-P]ATP was included only in one set of 
reactions. NIH(pRS9rt<'rt).MC cells do not have a high enough 
equilibrium level of ppeO**^*"*" to permit specific activity quantita- 
tion. (A) Exogenous substrate kinase assay. The portions containing 
[^-'-PIATP in ihe reaction mixture were analyzed by 10% SDS- 
polyacrylamide gel electrophoresis and 4-day exposure of the dried 
gel with an enhancing screen. Two sheets of aluminum foil were 
placed between the gel and the film to prevent exposure from the ^'S 
radioactivity (see Materials and Methods). Some differences be- 
tween |"P]enolase band intensities are masked in this autoradio- 
graph which is overexposed to display weaker autophosphorylation 
bands. (B) I^^Slmethionine immunoprecipitation. The portions not 
containing (^-'^PIATP were analyzed by 1.5% SDS-polyacrylamide 
gel electrophoresis. The gels were treated with En'Hance, dried, 
and exposed for 4 days without aluminum foil. M.W.. Molecular 
weight (X 10'). 
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stable than ppGO' ''' (Table 3). pp60^sn-,rr ^nd pp60*^^-''''^ 
are only moderately less stable than ppGO''"*'^'^. 

Different effects of pp60^ carboxyl-temtinal mutation on 
kinase activity and pp50-pp90 binding. The kinase activities 
of the carboxyl deletion mutant pp6(K '"* proteins bound in 
an immune complex with monoclonal antibody 327 were 
compared with enolase as a substrate and [7-^^P]ATP as a 
phosphate donor. ["SJmethionine-labeled pp6(K"*"' was used 
so that specific activities could be accurately determined by 
double-label scintillation counting (see Materials and Meth- 
ods). Removal of leucine-517, one amino acid beyond leu- 
cine-516, caused a decrease in kinase activity (compare 
activities of ppeo'^^"'"" and ppeo*^^^'*"; Fig. 6 and Table 3). 
The specific kinase activity of pp60'^^^*'''' could not be 
measured because of its low equilibrium level. 

The molar phosphorylations of the deletion mutant pro- 
teins were compared by immunoprecipitating pp60^'*'^'^ pro- 
teins from sister cultures which had been labeled with either 
or P^S]methionine. Comparison of the immunoprecipi- 
tated bands showed that pp60**sii-m pp^QHSs-^r* ^g^e 
phosphorylated at least as much as psrcll (Fig. 7). The 
extent of pp60*^^''*" phosphorylation could not be quantita- 
tively determined because of its low equilibrium level. 

All three deletion mutants bound pp50 and pp90 effi- 
ciently. These proteins conriigrated with pp50 and pp90 
bound to pp60'"'" in LA90 3T3 cells (generously provided by 
J. Brugge). The identities of the pp90 bands were confirmed 
by comparing 5. aureus V-8 protease digests with those of 
pp90 from the LA90 3T3 cells. Both ^^S and multiple ^^P 
comparisons indicated that the truncated proteins bind at 
least as much and probably more of these cellular proteins 
on a molar basis than pp60^'^'^*^ (Table 3). 

DISCUSSION 

The sequence encoding the pp60'"^" carboxyl terminus 
apparently evolved by recombination between the c-src 
coding sequence and a cellular sequence found about 900 bp 
downstream from the c-src termination codon (57, 59). 
Homology of part of this cellular sequence to c-src suggests 
that it may have originated by DNA duplication, but it is not 
known whether it has ever been part of a protein-coding 
region. Whether or not this specific sequence plays a role in 
ppbO""^" induced transformation has been unknown. By 
constructing and characterizing genes encoding 12 different 
ppeO"*'" carboxyl-terminal mutants, we showed that trans- 
forming activity is relatively insensitive to extensive replace- 
ment or deletion of almost the entire region involved in the 
y /c-src divergence (Fig. 2; Table 1). 

All the modified genes were expressed from homologous 
plasmids having identical enhancer-promoter, amino coding, 
and polyadenylation regions. The use of naturally occurring 
DNA sequences identified by computer screening rather 



FIG. 7, Coprecipitation of pp50 and pp90 with ppW*" from "P- 
and "S-labeled cells. Sister cultures of G-418-resistanl mass-culture 
lines and NIH(pjrcll) were labeled with either "P or 
("Slmclhionine as described in Materials and Methods. LA90 3T3 
cells (which have been previously used to study pp50-pp90-pp60*"^ 
binding [J. Bnigge, personal communication)) were labeled only 
with *^-P under identical conditions. Immunoprecipitates from ly- 
sates containing equal amounts of TCA-precipitable ^^S or "P 
radioactivity were made with monoclonal antibody 327 and analyzed 
by separate 7.5% SDS-polyacrylamide gel electrophoresis as de- 
scribed in Materials and Methods. (A) "P-labcled cell lysates. 
Exposure time = 22 h. (B) ("S]methionine-labcled cell lysates. 
Exposure time = 10 days. M.W.. Molecular weight (xlO^). 
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TABLE 3. Comparison of r-src proteins in coselected cell cultures 



Coselectcd mass culture 
cells 


Relative 
pp6a- 
synthesis 
rate" 


Relative 
pp60*«" 
equilibrium 
leveh" 


Relative 
PP60"^ 
stability* 


Relative 
autophosphory- 
latton sp acf 


Relative 
enolase 
phosphoryla- 
tion sp act' 


p50/p60^ 


p90/p60'' 


pp50/pp60' 


pp90/pp60' 


NIH(pSV2/i^o).MC 


NM^ 


0.4 


NM^ 


0.5 


NM^ 


NM^ 


NM 


NM 


NM 


NIH(pjrcll/ieo).MC 


1 


0.2 


1 


1 


1 


0.1 


0.4 


0.04 


0.07 


NIH(pRSll/ieo).MC 


6 


0.7 


0.6 


0.8 


0,8 


0.2 


0,4 


0.1 


0.2 


NIH(pRS8/ifo).MC 


6 


0.6 


0.5 


0.5 


0.2 


0.2 


0.5 


0.1 


0.2 


NIH(pRS9rt^o).MC 


0.7 


<0.2 


<0.2 




NM 


0.2* 


0.3* 


NM 


0.3* 



" Values are from the experiment showii in Fig. 5. Relative synthesis rates and equilibrium levels were determined by comparing the amounts of radioactive 
pp60*« immunoprecipitated from cells labeled for times short h) and long (27 h) relative to protein halMives. All immunoprecipitations were performed with 
lysates containing equal amounts of TCA-precipitable radioactivity. All values are relative to the amount of radioactivity in the NlH(pjrcU).MC ppeo* '"" band. 

* Relative stability (relative equilibrium level/relative synthesis rate) was determined as described in Materials and Methods. Stabilities arc given relative to that 
ofppM**"^. 

* Values are from two experiments of the type shown in Fig. 6. Specific activities arc normalized to pp60''*'"' activity. 

Values are from the experiment shown in Fig. 78 and give the amounts of radioactivity found in the p50 or p90 bands relative to the amounts of radioactivity 
in the p60"^ bands obtained by immunoprecipitation from |"SJmcthioninc-labeled cells (14-h label). All immunoprecipitations were performed with ly sales 
containing equal amounts of TCA-precipitable radioactivity. ,.320 

' Values are from the experiment shown in Fig. 7A which was executed and analyzed like that shown in Fig. 7B except that cells were labeled with "P (14 h). 

^ NM, Not accurately measurable. 

' Because of the low intensity of the signal, the value is only approximate. 



than synthesized oligonucleotides permitted significant ex- 
perimental cost reduction. As expected, we found that the 
diflFerent pp60^"*'''* proteins were synthesized at roughly com- 
parable rates in cotransfected cells (Fig. 4). All the modified 
proteins migrated as expected from their altered sizes except 
for pp60'^^^*'^ which migrated more rapidly than expected. 
This may be due to increased SDS binding to the rather 
hydrophobic tail of this protein. 

The most obvious diflFerence between pp60*^*'" and 
pp60^*'^'' is in their sizes. We showed that the extended 
length of pp60^""^ does not by itself suppress transformation 
since genes encoding proteins which are up to 10 amino acids 
longer than pp60^ '"^ (pp60'^^'«'; Fig. 2) still induce efficient 
focus formation (Table 1). The focus-forming activities of the 
longer \-src proteins decreased monotonically with increas- 
ing length (Table 1), but it is possible that these decreases 
were sequence dependent. The appended peptides in these 
constructs may be loosely attached to the surface of the 
basic pp60'"* structure and probably have higher internal 
flexibilities than normal protein structural regions. They may 
reduce transforming activity by steric hindrance of active 
sites. 

Experiments done with both substitution (pRS4 and pRS5) 
and deletion (pRSll, pRS8, and pRS9) mutations of various 
extents (Fig. 1 and 2) showed that removal or (at least some) 
replacement of leucine-516 abrogates transforming activity 
(Table 1; Fig. 3). This is consistent with the result of 
Wilkerson et al. (65) who showed that mutation of the 
carboxyl-lerminal 11 residues of pp60'"'" (a region including 
leucine-516) eliminates transforming activity. 

As long as Leu-516 is preserved, reducing the length of the 
carboxyl terminus does not reduce pp50 and pp90 binding; 
rather, the data suggest that binding of these proteins is 
increased (Table 3). Since binding is transient (5), the 
apparent increases in specific binding might result from the 
observed increases in pp60^"''^ turnover rates (Table 3) if the 
binding period was of relatively fixed duration. In this case 
the fixed binding period woiild occupy an increased fraction 
of the reduced protein half-life. This hypothesis would also 
account for the low pp5(y-pp90 specific binding to pp&f'"' 
which has a longer half-life than ppeO"""^ (22). Alternatively, 
removing the carboxyl terminus may directly increase af- 
finity. In either case, these results show that the carboxyl- 



terminal nine amino acids of pp60''"'" are not required for 
pp50 and pp90 binding. 

The in vitro exogenous substrate-specific kinase activity 
of pp60'^^"'''''^ (which is missing the last nine amino acids of 
pp60''-'"^ is similar to that of pp60''-'''*' (Fig. 6; Table 3), 
showing that most of the pp60'' *'''^ carboxyl-terminal region 
which diverges from pp60*='''''^ is not required for kinase 
activity. However, specific kinase activity decreases 
abruptly with removal of one more residue (Leu-517), sug- 
gesting that this is near the C-terminal boundary of a part of 
the kinase catalytic domain. This is consistent with the 
finding of Gentry et al. (17) that antipeptide serum specific 
for pp60'' '^'' residues 498 to 512 inhibits phosphorylation of 
exogenous substrates, while antiserum specific for the ter- 
minal six residues does not inhibit kinase activity. It is 
interesting that this is at the boundary of a region of 
homology shared by the sequenced protein-tyrosine kinases 
(21). Since pp50-pp90-pp60'' *"* complexes have reduced 
immunoglobulin G kinase activity (4), it is possible that the 
reduction in pp60'" kinase specific activity is caused by 
increased association with these proteins. 

Relative pp60'^''"^ turnover rates were calculated from the 
relative amounts of radioactive label incorporated into the 
T-src proteins during periods which were short or long 
relative to the protein half-lives and showed that deletion of 
leucine-516 (in pp60^^^'"') results in a protein which is 
degraded at least five times faster than SR-D pp60'' *'^ (Table 
3). Pulse-chase experiments with pp60'^*^'''^ proteins from 
coselected cell lines confiirmed these differences in relative 
stabilities (data not shown), but effects due to reutilization of 
radioactive amino acids precluded the determination of 
absolute half-lives. (Studies with cultured cells have shown 
that over 80% of the amino acids labeled during the pulse 
period are reincorporated into protein synthesized during the 
chase period [45]; thus, half-lives measured in this type of 
experiment can be artificially extended (27. 43].) 

The removal of leucine-516 may affect protein stability by 
altering protein conformation, transport, localization, or 
binding to other proteins. Our data do not indicate whether 
strict conservation of this residue is required for stability or 
whether replacement with other hydrophobic residues might 
preserve protein stability and activity. For example. 
pp60^*-*" instability might be caused by the introduction of 
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TABLE 4. Conservation of Icucine-Sia" 



Protein Amino acid sequence 



Chicken c-jrc . . .WR RD P E E R P T F E Y LQ A F LEDY F T S T E P Q YQ PG EN L 

Human c-jrcl . . .WR R E P E E R P T F E Y LQ A F L E D Y F T S T E P Q YQ P G E N L 

Human c-jrcll . . WR L E P E E R P T F E Y LQ S F L E D Y F A S T E P Q YQ P G DQ T 

Xenopusc-src LQ A F LED Y F T AT E PQ YQ P GDN L 

Drosophiia src . . .WD A V P E K R P T F E F LNH Y L E S F S V T S E V P Y R E V QN 

y.src . . WRRDPEERPTFEYLQAQLLPACVLEVAE 

y-fgr . . .WRLDPEERPTFEYLQSFLEDYFNGPQQN 

w-yes . . WKKDPDERPTFEY IQSFLEDYFTAAEPSGY 

y.ros ...WAQDPHNRPTFFYIQHKLQEIRHSPLCFSYFLGDK + 15 aa 

w-fps . . .WEYDPHRRP S FGAVHQDLIAIRKRHR 

v-/« .. .WAYEPGQRPSFSAI YQELQSIRKRHR 

v/mj .WALEPTRRPTFQQl CSLLQKQAQEDRRVPNYTNLP + 16 aa 

c-fms . . .WALEPTHRPTFQQ I C S F LQ EQ AQE D R R E R D Y TN L P + 45 aa 

v-m// (v-m/i/) . . .LKKVREERPLFPQI l'Il LQH S L PKLN R S A S E P S LH + 21 aa 

y^r^f . . .VKKVKEERPLFPQ 1 l1?^ LQH S L P KLN R S A P E P S LH + 22aa 

Drosophiia ash . . .WQWDATDR P T FK S I HH A L EHMF Q V G N V 

\s\J{ick) . . .WKERPEDRPTFDYLRS VLDDFFTATEGQYQPQP 

Human insulin rec, . . .WQ FN P NMR P T F L E I VNLLKDDLHPSFPEVS 

y-mos . . .1 QSCWEARGLQRPSAELLQRDLKAFRGTLG 

Human EGF rec. . . .WM I DADSRPKFRELI I E F S KMA R D P Q R Y L V I QGDE + 225 aa 

cerbB'l . . .WMI DSECRPRFRELVS EFSRMARDPQRFVV I QNED + 262 aa 

y-erbB ...WMIDADSRPKFRELIAEFSKMARDPDRYLVIQGDE + 199 aa 

neu . . .WMI DSECRPRFRELVS EFSRMARDPQRFVV I QNED + 263aa 

v-flW . . .WQWNPSDRPS FAE I HQAFETMFQES S 1 SDEVEKEL + 287 aa 



- The amino acid sequence in the carboxyl-temiinal region of ppeo* '"^ is aligned with the homologous sequences from 23 other protcin-tyrosmc kinasc-rclMed 
proteins encoded by chicken c-src (59), human c-src I (1, 39). human c-src II (39). Xenopus c-src (54), Drosophiia src and ash (20), v-src (11.46.58), w-fqr (36), 
v yes (26), v ros (37), v/pj (52). v-fes (19), v-fms (18). c-fms <9), v-mil iv-mht) (25,56). v-rqf 02). Uk^ Uck) (33,63), human insulin receptor (61), y-mos (62). human 
epidermal growth factor (EGF) receptor (12.61). c-erbB-l (67), y-erbB (66), neu (2), and v^W (44). The chicken c-5rc, v-src, y-fgr, v-yes. v-roj, y-fps, v-fes, v-oW. 
human EGF receptor, and y-erbB sequences were aligned as in Hunter and Cooper (21). The additional cellular src, c-erbB-1, neu, and abl sequences were aligned 
by obvious homology. Only part of the Xenopus c-src sequence has been published. The c-fms, v-fms, v-mi7. y-rqf, and human insulin receptor sequences were 
aligned by using the almost completely conserved sequence RPXF at src positions 506 to 509. In agreement with Mark and Rapp (32), the v-mos sequence was 
aligned at the conserved R at src position 506. Beyond adjustment of the amino acids SIA and SIE in the v-mii and v-rqf sequences, no attempt has been made 
to maximize homologies by relative insertions or deletions, aa, Amino acids. 

a negative charge (from the COOH terminus) into an internal 
region of the protein. While a complete understanding of the 
significance of Leu-516 may have to await determination of 
the ppW"" three-dimensional structure, comparison with the 
homologous domains from other sequenced protein-tyrosine 
kinases and related proteins (Table 4) shows that this residue 
is highly conserved (present in 18 of 23 sequences), partic- 
ularly when compared with the conservation of its inunedi- 
ately neighboring residues. It is completely conserved in the 
proteins in which it is located near the carboxyl terminus and 
is invariably altered to another hydrophobic residue, phe- 
nylalanine, in the proteins in which it is IcKated more 
internally in the primary sequence. Sequence comparison 
throughout the entire catalytic domain (21) shows that leu- 
cine-516 is the furthest downstream of the highly conserved 
residues. 

This residue is also notable in that, in the cellular proto- 
oncogene product pp6(y^ *'**, it marks the amino boundary of 
a region which appears to have the hallmarks of a compact 
loop structure (a strongly hydrophilic region preferentially 
containing Gly, Pro, Asp, Asn, Ser, and Tyr flanked by 
hydrophobic residues [J. F. Leszczynski and G. D. Rose, 
submitted for publication]). This potential structure extends 
to the carboxyl terminus of pp60^*'"* and includes its m^jor 
site of tyrosine phosphorylation (6). The observation that 
phosphorylation of this tyrosine lowers pp60^ *"' kinase 
specific activity (8) suggests that this region of the pp60^**^ 
molecule provides some negative regulatory function. This 
suggestion is supported by the finding that replacement of 
this region in pjrcll with the corresponding region of c-5rc in 



constructs similar to those discussed here almost completely 
eliminates transforming activity in NIH 3T3 cells (49; P. 
Yaciuk, unpublished data). Thus, the primary significance of 
the pp60'' '"' carboxyl terminus may be just that it eliminates 
the negatively acting pp60^ '"' carboxyl terminus. From this 
standpoint, the fact that a wide class of modifications of the 
pp60^'*'^ carboxyl terminus do not eliminate transforming 
activity is not surprising. 
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I^Vhile protein-tyrosine kinases share a region of sequence 
identity corresponding to their kinase domains, the specific 
'f^elements essential for catalysis, substrate binding and 
substrate specificity are largely undefined. The P130^S"** 
transfonning protein of Fujinami avian sarcoma virus is a 
Icytoplasmic tyrosine kinase with a complex structure that 
lincludes a C-terminal kinase domain. To identify the 
precise N-terminal border of the y-fps catalytic region and 
§to assess its interactions with non-catalytic domains, C- 

I terminal y-fps polypeptide fragments of decreasing size 
Jlirere expressed in £. colt as trpE-y-fps hybrid proteins. All 
Hsiich polypeptides containing 263 or more residues derived 
Ifrom the C-terminus of P130«'^'*« O-c residues 920-1182) 
f were enzymatically active as tyrosine kinases. They auto- 
Iphosphorylated at physiological sites in vivo and phos- 
Iphorylated exogenous substrates such as enolase and 

ipf poIy(glu,tyr) at tyrosine in vitro. Deletion of a further five 
"^amino acids from PISO^*^ residues 920-925 abolished 
^ all enzymatic activity. This deletion coincides with the 
£ predicted N-terminus of the y-fps ATP-binding site at 

II residue 922. These data indicate that the N-terminal border 
1 of the ATP-binding site defines the start of the minimal v- 
^fps tyrosine kinase catalytic domain, and show that this 

minimal domain is competent to biiid substrates. More N- 
^terminal non-catalytic sequences appear to functionally 
|| interact with the catalytic domain. 



^Introduction 



IjProtein-tyrosine kinases (PTKs) are complex polypep- 
glides composed of multiple structural and functional 
""pdomains (Levinson et ai, 1981; Stoker et a/.,.1984; Stone 
al, 1984; Stone & Pawson, 1985). The catalytic 
^activities of these enzymes are apparently modified by 
Jregulatory domains with a variety of functions, including 
^^argetting of newly synthesized protein to the plasma 
Ipembrane and direct modulation of kinase activity 
^(Gather et al, 1985; PeUman et a/., 1985; Cooper et ai, 
gJ986; Sadowski et aL, 1986). It also seems likely that non- 
y^catalytic sequences will prove to be involved in substrate 
precognition (Sadowski et aL, 1986; Jove et al, 1986). 
Il^^ructural alterations in one or more of these regulatory 
1^ aomains leading to derepressed enzymatic activity are 
^5 frequently involved in the oncogenic activation of PTKs 
^ downward et aL, 1984; Shtivehnan et al, 1 985; Yaciuk & 
If ShaUoway, 1986). 

Por a PTK, the precise definition of the catalytic 
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domain and its intramolecular interactions with adjacent 
elements is important in luderstahding the mechanisms 
by which enzymatic activity is regulated and cellular 
targets are selected. We have previously employed a 
variety of genetic and biochemical techniques to invest- 
igate the domain structure of the PI SO^^'^p* tyrosine 
kinase encoded by Fujinami avian sarcoma virus (FSV) 
and the functions of specific amino acids within its kinase 
region (Stone et al, 1984; Weinmaster et ai, 1983, 1984, 
1986). The catalytic domain of P130^«''p' can be isolated 
by partial proteolysis within a 29-kDa C-terminal frag- 
ment that retains enzymatic activity (Weinmaster et al^ 
1983). In keeping with this observation, approximately 
260 residues near the C-terminus of P130^^'p' (residues 
911-1173) show sequence identity with all other PTKs 
and to a lesser extent with serine/threonine-specific 
protein kinases. Immediately to the N-terminus of the 
predicted kinase domain is a sequence of approximately 
90 amino acids (residues 810-900) which is shared with 
other cytoplasmic, tyrosine kinases such as peO*'^, and 
which may associate with cellular proteins that regulate or 
mediate kinase function (Sadowski et ai, 1986). We have 
termed this domain SH2 (for src homology 2). The 
proposed SH2 and catalytic domains are joined to an N- 
terminal J^^-specific domain unnecessary for kinase 
activity but important for the induction of neoplastic 
transformation (Stone et al, 1984; Stone 8l Pawson, 
1985). 

As a means to identifying the precise boundaries of 
P130P's-fps domains and to investigate their functional 
interactions, we have constructed plasmid vectors for the 
synthesis of v-fps polypeptides in E. coli. By manipulation 
of the coding sequences for the expressed proteins we 
define the N-terminal boundary for the kinase catalytic 
domain, and identify potential interactions with non- 
catalytic sequences. 



Results 

High level expression.of v-fps polypeptides in E. coli 

V'fps coding sequences derived from the FSV genome 
were expressed as trpE-v-fps polypeptides in E, coli 
following insertion into the multiple cloning site of pATH 
vectors (Figure 1). These plasmids aUow the expression of 
foreign sequences as hybrid proteins whose N-tennim are . 
encoded by tiie bacterial trpE gene. Synthesis of fusion 
proteins can be induced by tryptophan starvation and ^ 
growth of bacteria in die presence of indole acryhc . aad,^ . 
Two sets of expression plasmids were constructed, as 
detailed in Figure 1 and in Materials and methods. Table . . . 
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Figure 1 Construction of trpE-v-fps bacterial expression vectors, v-fps sequences encoding C-tenmnal PO^fP^<^e/f 
Fujinami avian sarcoma virus (FSV) ?n(^^^ were excised from wt pJ2 plasmid (a and b) or the RXlSm, RXiSm and AJC9m FSV 
mutant plasmids containing unique Xhol sites (c). Fragments with blunt ends generated by digestion with Smal or PvuH/Smal were 
cloned into the Smal site of pUC vectors (a). The BspMI/Smal digested wt DNA was first made blunt by treatmentwith Klenow enzyme 
(b) C-terminal encoding fragments frame Xhol linker mutants were cloned into the SaU/HindlD sites of pUC8 (c). T^e various sized y-fps 
pUC inserts were then subcloned into the proper reading form of pATH-1. pATH-3 or pATH-1 1 with EcoRI and HmdEQ (a. b c). 
For construction of BaI31 deletions. pTF729 (b bottom) or pTF893 (b bottom) were digested with KpiJ, ^^^^?^^5f^f^^°°7 
enzyme and then reUgated to the EcoRV sit&of the (rpEcoding sequence (d). Abbreviations are: B - BanaHI, H = HmdUl, K - Kpni. 
P = Pvun, R = EcoRI, S = Sstl. Sm = Smal. Bs = BspMI, Nt NotI, X = Xhol. RV = EcoRV 



1 summarizes the coding potential of these trpE-v-fps 
vectors. One class of plasmids, denoted pTF, encode 
proteins with 323 N-terminal trpE amino acids (37 kDa) 
linked in-frame to a nested set of v-fps C-terminal 
sequences. A second series of plasmids designated pTdl 
encode fusion proteins with only 42 N-terminal trpE 
residues joined to a distinct set of y-fps polypeptides (see 



Table 1). In each case, the residue within P130*''''*" 
corresponding to the first v-fps amino acid in the trpE-v- 
fps fusion protein is indicated in the plasmid name; all 
fusion proteins have a comon C-terminus corresponding 
to the C-terminal amino acid (residue 1 182) of P130^« 

Following induction, RRl E. coli containing the plas- 
mid constructs were found to abundantly express new : 



Table 1 Coding potential of XrpE-v-fps expression plasmids 



Plasmid 



trpE amino aci<W 



p-fps amino acidf 



Protein 
molecular weight^ 



y fps End 
derived from^ 



pTF49 

pTdl359 

pTF637 

pTF729 

pTF822 

pTdl823 

pTF833 

pTdl835 

pTdl878 

pTF893 

pTdl919 

pTdl920 

pTdl925 

pTdl935 

pTdl938 



1-323" 
1-42 
1-323" 
1-323" 
1-323" 
1-42 
1-323" 
1-42 
1-42 
1-323" 
1-42 
1-42 
1-42 
1-42 
.1-42 



49-1182 
359-1182 
637-1182 
729-1182 

822- 1182 

823- 1182 
833-U82 
835-1182 
878-1182 
893-1182 

919- 1182 

920- 1182 
925-1182 
935-1182 
938-1182 



167.3 
99.4 
99.6 
89.1 
78.4 
46.1 
77.1 
44.7 
39.8 
70.2 
35.1 
35.0 
34.4 
33.2 
32.9 



Smal pJ2 
Ncol pTF49 
Xhol pRXlSm 

Pvun pJ2 
Xhol pRXlSm 
Bal31 pTF729 
Xhol pAX9m 
BaBl PTF729 
BaI31 PTF729 

BspMI pJ2 
BaBl pTF893 
Bal31 pTF893 
Bal31 pTF893 
BaBl pTF893 
BaBl pTF893 



■ Numbered according to Yanofskye/fl/. (1981) ^.^.^^t^ja^i r i,^r r*.cnoflS 

" The pTF constructs contain froto 2 to 7 amino acids at the /rp£-v-^j junction sequence encoded by the pUC and pATH poly imKer rc^u , 

* Numbered according to Sbibuya & Hanafusa (1982) 

^ Calculated molecular weight in kDa based on amino acid sequences 

• Parent plasmid for construction and enzyme used to generate the 5' end ot y-fps coding sequences 
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Figure 2 Expression and phosphorylation of trpE-v-fps fusion proteins in E, colt a: pTF plasmids. coli containing pATH-1 (lanes I 
and 7), pTF637 (lanes 2 and 8), pTF729 (lanes 3 and 9), pTF822 (lanes 4 and ID). pTF833 (lanes 5 and 1 1) and pTF893 Ganes 6 and 12) 
plasmids were grown in suspension culture and the tryptophan operon was induced for 2 h: The induced cells were nietabolically labeled . 
with "Pj, lysed in SDS-urea buffer, and whole cell lysates analyzed by electrophoresis through 10% SDS-polyacrylamide gels (see, 
Materials and methods). Proteins were detected by staining with Coomassie blue (lanes 1-6); phosphoproteins were identified by 
autoradiography of the same gel for4h (lanes 7- 12). b: pTdl plasmids. E. co/i conUiningpATH-1 (lanes 1 and 10). pTd 1359 (lanes 2 and.. 
1 1). pTdl823 (lanes 3 and 12). pTdl878 (lanes 4 and 13). pTdl919 (lanes 5 and 14). pTdl92D Oanes 6 and 15). pTdl925 (lanes 7 a???. W^^^ 
DTdl935 Hanes S and 17^ nnH nTHIQ'^Jt nonpc Q anH Ifi^ 



pTdl935 (lanes 8 and 17) and pTdl938 (lanes 9 and 18) were grown, induced and labeled with as above. Proteins "^^^^J^^^^^J^ 
electrophoresis and staining with Coomasie blue Oanes 1-9), followed by autoradiography for 4 h Oanes 10-18). The mobiuues o J J^.;:. 
^narkers and their molecular weight (x 10') are indicated. The positions of the 37-kDa pATH protein and of the trpE-^-Jps '^^sio 



proteins are indicated by arrows 
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Table 2 Kinase activity of trpE-v-fps polypeptides 



trpE-v-fps 
protein 



pATH 

pTdl359 

PTdl823 

pTdl878 

pTdl919 

pTdl920 

pTdl925 

pTdl935 

pTdl938 

pTF637 
pTF729 
pTF822 
pTF833 
PTF893 



Kinase activity 



Autophosphorylation 
vcf* in vitro* 



enolasff' 



Phosphorylation 



polyfglujyr)' 



0 
16 
389 
218 
12. 
5 
0 
0 
0 

14 
8 

25 
17 
5 



0 
62 
340 
200 
3 
0 
0 
0 

. 0 

27 
24 
93 
35 
30 



0 
68 
280 
231 
4 
0 
0 
0 
0 

36 
70 
91 
37 
124 



0 
722 
3200 
7308 
60 
35 
0 
0 
0 

762 
1446 
1304 

877 
1498 



(% Soluble 
Protein) 



73 
20 
1 

6 
11 
18 
12 
11 
10 

42 
20 
7 
10 
44 



• Kinase activity is measured as cp.m. of (x Itf) incorporated per 40pmole soluble trpE-v-fps protein 

" ^P metabolically incorporated into trpE-v-fps fusion protein 

•= Autophosphorylation of trpE-v-fps proteins in crude bacterial lysates 

^ Phosphorylation of acid-denatured enolase by trpE-v-fps fusion proteins in crude bactenal lysates 
' Phosphorylation of poly(glu,tyr) (80:20) by crude bacterial lysates 



proteins of the predicted sizes (Figure 2a, lanes 1-6; and 
Figure 2b, lanes 1-9). Only the product of pTF49 with a 
molecular weight of 167kDa did not accumulate to 
significant levels (data not shown). In some cases the 
identities of these novel polypeptides were confiraied by 
their specific immunoprecipitation from lysates of 
p^S]methionine-labeled E, colimih anti-^5 rat antiserum 
against PIBO^^'^p" (data not shown). The' trpE-v-fps 
proteins expressed from these plasmids represented up to 
15-20% of total cellular protein synthesized during the 
induction period. The percentage of trpE-v-fps protein 
which was soluble following bacterial cell lysis varied 
widely from 1% (for pTdl823) to 44% (for pTF893) as 
detailed in Table 2. 

Tyrosine phosphorylation o/trpE-v-fps fusion proteins in 
bacteria defines a minimal catalytic domain required for 
autophosphorylation 

Following induction of the tryptophan operon, E. coli 
containing, the pATH or trpE-v-fps expression plasmids 
were metabolically labeled with ^^Pj for 20niin, and 
lysates of the radiolabeled cells were analyzed by electro- 
phoresis (Figure 2). Autoradiography of the stained gels 
showed .that all of the pTF trpE-v-fps fusiop proteins 
incorporated (Figure 2a, lanes 8-12), unlike the 
parental 37-kDa trpE protein (Figure 2a, lane 7). 

Phosphoamino acid analysis of these labeled pTF trpE- 
V'fps proteins showed that they were phosphorylated 
exclusively at tyrosine (Figure 3). Furthermore, tryptic 
phosphopeptide analysis revealed that they were phos- 
phorylated at the same tyrosine residues in E. coli as was 
P130SH8-fp«in rat-2 cells (see below). These results suggested 
that these bacterial trpE-v-fps proteins were 
enzymatically active.tyrosine kinases capable of authentic 
autophosphorylation. The smallest pTF protein 
(pTF893> contains residues 893-1182 of 
p.l30P'8-rp»^ which are clearly capable of catalyzing 
autophosphorylation. 

The span of pTdl hybrid proteins is considerably 
greater than that of the pTF products (see Table 1). The 



largest (pTdl359) contains virtuaUy the entire v-fps 
coding sequence including all FSV coding elements 
required for neoplastic transformation of chicken embryo 
fibroblasts (Foster & Hanafusa, 1984), whereas the 
smallest (pTdl938) lacks a significant part of the highly 
conserved sequence conunon to all PTKs. To assay the 
ability of the pTdl proteins to autophosphorylate in vivo, 
induced cells were labeled with ^^Pj and whole cell lysates 
were analyzed by electrophoresis and autoradiography 
(Figure 2b, lanes 11-18). ^^P incorporation was detected 
for pTdl920 and all larger proteins (Figure 2b, lanes 1 1- 
15), suggesting that N-terminal v-fps sequences could be 
deleted to PI 30^8'^^ amino acid 920 while still retaining 
autophosphorylating catalytic activity. The only promin- 
ent phosphoproteins m these cells were the pTdl trpE-v- 
fps polypeptides themselves. No phosphorylation was 
detected for pTdl925-938 (lanes 16-18) or for the 
parental pATH protein (lane 10), indicating that the 
ability of v-fps polypeptides to autophosphorylate in 
bacteria was entirely lost vwth the deletion of the five 
amino acids between PDO^^ '*" residues 920 and 925. 

Quantitation of the in vivo phosphorylation of trpE-v- 
^j. proteins (Table 2) indicated that the C-terminal v-fps 
protein fragments initiating at residues 919 and 920 had 
significantly lower autophosphorylating activity in vivo 
than the pTdl878 product with 41 additional v-fps amino 
acids. 

The minimum v-fps catalytic domain phosphorylates 
exogenous substrates 

We investigated to what extent the deletion of Nrterminal 
sequences would aflfect the ability of v-fps fragments to 
phosphorylate exogenously added substrates in vitro. A v 
sensitive and specific substrate for assaying tyrosine 
kinase activity in crude cell lysates is a random polymer 
of glutamic acid and tyrosine (Schieven et aL, 1986). 
Lysates of induced E, coli cultures were incubated with 
poiy(glu,tyr) in the presence of [7-"PlATP. The polymer 
was separated from the reaction mixtiu-e on non^sodium 
dodecyl sulfate (SDS) denaturing , polyacrylamide gels. 
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Figure 3 Phosphoamino acid analysis of trpE-v-fps fusion proteins. trpE-v-fps fusion proteins labeled in vivo with were recovered 
from polyacrylamide gels, hydrolyzed in 6 N HCl, and analyzed for phosphoamino acid content by two-dimensional separation using 
electrophoresis at pH 1,9 and 3.5. The migration of phosphoaniino acid markers, revealed by ninhydrin staining, is indicated for . 
phosphoserine (S), phosphothreonine (T) and phosphotyrosine (Y) 



After extensive washing to remove unincorporated label, 
the gels were dried and autoradiographed. As shown in 
Figure 4 and quantitated in Table 2, the abilities of the 
pTF and pTdl proteins to phosphorylate poly(glu,tyr) 
roughly correlated with autophosphorylation in vivo. 
pTdl919 and 920 both phosphorylated the substrate, 
while pTdl925 had no activity. Thus, enzymatic function 
was again abolished by the loss of v-fps residues 920-925. 
The activity of pTdl920 in poly(glu,tyr) phosphorylation 
was slightly lower than that of pTdl919, which in turn 
was markedly less active than pTd 1 878 (Table 2). Enolase 
is a physiological substrate for PI 30^^'^^* in FSV-transform- 
ed fibroblasts, and we therefore determined the capacity 
of the bacterial trpE-v-fps polypeptides to phosphorylate 
enolase in vitro. Acid-denatured enolas6 (5/xg) was 
incubated with crude ly sates of induced E. co// cultures in 
the presence of [y-"P]ATP. Enolase was specifically 
phosphorylated by all of the pTF trpE-v-fps proteins 
(Figure 5a). Indeed in each case only enolase and the pTF 
(rpE'V-fps product were strongly phosphorylated, despite 
the presence of normal bacterial proteins. All of the pTdl 
pro terns larger than pTdl919 inclusive phosphorylated 



enolase (Figure 5b). The in vitro kinase activity of the 
pTdl919 protem measured by enolase phosphorylation 
was considerably lower than that of the pTdl 878 product. 
Longer' exposure of the autoradiograph of Figure 5b 
showed that pTdl919 autophosphorylated in vitro as well 
as phosphorylating enolase. No in vitro kinase activity 
assayed by enolase phosphorylation and autophos- 
phorylation was detected for pTdl920-938 (Figure 5b, 
lanes 1 3 - 1 6) or for the 37-kDa pATH protem (Figure 5a, 
lane 7). 

The specific activities of the various trpE-v-fps protems 
for phosphorylation of enolase and poly(glu,tyr) were 
quantitated following correction for inactive msoluble 
trpE'V-fps protein, and were compared with the values 
obtained for autophosphorylation in vitro ^nd m :vtvo: 
(Table 2). The activities of the pTF Protems wreqmtq;, 
similar for all parameters, differing at most by 3- to^J^^ 
between pTF637 and pTF893. Hie ^ctnades of th^^ . 
trpE-v-fps proteins showed more vanation^^^l^^ 
pTdl878 were about 5-fbld ^^re aotn^ 
while the activity of pTdl9l9 was 50- tc^g^pp^ 
that of pTdl878. pTdl920 was less; ^^^^^^^^^ 
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Figure 4 Phosphorylation of poly(glu,tyr) (80:20) by trpE-v-fps 
fusion proteins. lOftg poly(gIu»tyr) polymer was incubated with 
crude lysates of induced E, coH containing expression plasmids (as 
indicated) in the presence of SmCi [-y-^JATP for i5min at 37'C. 
poly(glu,tyr) was separated from the reaction mixture by electro- 
phoresis on a 5% polyacrylamide-urea gel. Unincorporated label 
was washed from the gel which was then dried and autoradiogr- 
aphed for 4h. The position of poly(glu,tyr) is indicated 



pTdl919, and indeed possessed negligible ability to 
phosphorylate enolase or to autophosphorylate in vitro, 
pTdl925, 935 and 938 were inactive in every assay. Since 
the products of these latter constructs were more soluble 
than polypeptides with readily detectable kinase activity 
(Table 2), solubiUty cannot have been a factor in their lack 
of kinase function. 

It is of interest to note that the pTF proteins with 323 
N-terminal trpE residues were consistently less active 
than similar pTdl proteins with only 42 trpE amino acids 
(Table 2). Thus, the pTF822 product was about 3-fold less 
active than pTdl823, even though they differ by only a 
sinigle v-j/i?^" residue. The extent of N-terminal trpE 
sequences may therefcire have a quantitative effect on v- 
fps kinase activity. 



Deletion of the SH2 non-catalytic domain affects the 
pattern of v-fps tyrosine autophosphorylation 

Tryptic phosphopeptide analysis of the bacterially 
produced pTdl trpE~v-fps proteins showed that they 
autophosphorylated at the same tyrosine sites in £. co/z as 
did PI 30^*"'** in rat-2 cells. Tryptic phosphopeptide maps 
for the pTdl835 and pTdl878 proteins from bacteria and 
for wild type P130^s^p* from FSV-transformed rat-2 cells, 
all isolated following metabolic labeling of bacterial or 
mammalian cells, are shown in Figure 6. Phos- 
photyrosine-containing peptides 3a- 3c, 4 and 7 of 
pl30Bas-f[» co-migrated with the corresponding peptides of 
the bacterial pTdl 835 protein (data not shown). Peptide 1 
of PlSO^^'*^' has been mapped to the gag region and 
therefore is not seen m the bacterial proteins, while 
peptides 5, 6 and 8 contain phosphoserine. The major site 
of autophosphorylation, tyr-1073, is represented by pep- 
tides 3a-3c (Weinmaster et aL, 1984). Two minor peptides 
were consistently seen in the bacterial proteins which 
do not migrate with any peptides of PUO^^'^p*. These are 
indicated with arrows on the maps of pTdl835 and 
pTdl878. The peptide map of pTdl878 shows that the 
deletion of 43 amino acids from pTdl835 resulted in the 
loss of phosphotyrosine-containing PHO^'s-^p' spots 4 and 
7. These data are consistent with two possibilities. The 
tyrosine residues that are phosphorylated in spots 4 and 7 
may be located within the SH2 region; alternatively, the 
tyrosines may be located more C-terminally within the 
catalytic domain, but their ability to act as phosphoaccep- 
tors may be affected by the deletion of the SH2 non- 
catalytic domain. Because this deletion has no effect on 
autophosphorylation at tyr-1073, we favor the former 
possibility. 



Discussion 

We have constructed bacterial expression vectors encod- 
ing a nested set of v-^^j polypeptide fragments to precisely 
define the start of the w-fps tyrosine kinase domain, and to 
investigate the influence of N-terminal non-catalytic 
sequences on kinase activity and substrate recognition. 

The C-terminal kinase domain of v-fps shares amino 
acid homology with other cytoplasmic tyrosine kinases 
from amino acid 810 virtually to the C-terminus. 
However, homology with transmembrane tyrosine kin- 
ases begins only at lysine-91 1, which we have previously 
suggested is close to the start of the catalytic domain 
(Sadowski et ai, 1986) (see Figure 7). Consistent with this 
notion we find that a polypeptide containing PI 30^^' 
residues 878-1182 (in pTdl878) has full kinase activity, 
that proteins with residues 919- and 920-1182 have 
considerably lower but readily detectable activity, but 
that deletion of a further five amino acids to residue 925 
eliminates all catalytic function. We conclude that the N- 
terminal boundary ofv-fps sequences absolutely required 
for catalytic activity lies between residues 920-925 ot 
pi3QBag-fps (diagranuned in Figure 7). , 

We have previously tentatively located the N-terminm 
border of the FUO^^-^^ ATP-binding site at residue 922 
(Weinmaster et aL\ 1986), and therefore the deletion to 
residue 925 would infringe on this nucleotide-binding site. 
Amino acid sequences between leu-922 and lys-950 are 
implicated in ATP-binding on two counts. Lys-950 oi 
pj3Qgag^fpi jg an essential residue conserved among 
protein, kinases (Hunter & Cooper, 1985), and may 
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Figure 5 fii vitro autophosphorylation and phosphorylation of enolase by pTF (a) or PTdl (b) "bacterial trpE-s-fps f^'"" ^"rFW^ 
Lysates were made from induced £. coli containing the following plasmids. a: pATH (lanes 1 and 7), pTF637 (lanes 2 and |)> 
(lanes 3 and 9), pTF822 Oanes4 and 10), pTF833 (lanes 5 and 1 1), pTF893 (lanes 6 and 12). b: pTdl359 Oanes I and 9). pTd'^" ^Z,A 
and 10),pTdl878(lanes3and ll),pTdl9l9(lanes4and 12),pTdl920(lanes5and 1 3), pTd 1925 (lanes 6 and ^^X^'^^i^^^^i^: 



15),andpTdl938(lanes8and 16). Lysates were incubated with 5 /ig acid denatured enolase in the presence of 5 IV" ^l^. 
at 37*C. Reactions were analyzed by electrophoresis on 10% SDS-polyacrylamide gels, staining with Coomassie blue ™^ V 
lanes 1 -8) and autoradiography for 4 h (a Janes 7-12; b, lanes 9- 16), The posiUons of the pATH and fr/?£-vV>jprotem^ 
are indicated with arrows 
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Figure 6 Comparative tiyptic pliosphopepUde analysis otirpE-v-fps bacterial fusion proteins and Fujinama avian sarcoma virus (FSV) 
P130pHi«_ Two-dimensional tryptic peptide analysis was undertaken as foUows: PISO^*'", fl30F"-'?' isolated from "P-labeled FSV- 
transformed rat-2 cells; pTd 1 835, pTd 1 835 P44'^^'" protein from '^-labeled E. coli; pTdl 878, pTd 1 878 P40 "P-Iabeled bacterial 
protein. Gel-purified proteins were oxidized, digested with trypsin and separated in two dimensions. Electrophoresis at pH 2. 1 was from 
left to right and chromatography in 7V-butanol: acetic acid:water:pytidine (75: 15:60:50 by volume) was from bottom to top. Peptides 1, 
3a-3c, 4 and 7 of FS V PI 30*"^'' are known to contain phosphotyrosine. while peptides 5,6,and8contain phosphoserine (Weinmaster el 
at., 1985, 1984). Arrows indicate phosphoryiated peptides specific to the trpE-y-fps fusion proteins in £. coli 
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Rgure 7 Schematic summary of v-fps domain structure indicated by trpE-v-fps bacterial expression vectors. The structure of Fujinama 
avian sarcoma virus pnO^"''*" at top shows the kinase catalytic and SH2 regions. Confirmed and predicted sites of tyrosine 
phosphorylation are given. The N-tenninal border of the kinase domain is deduced from the catalytic activities, of v-fps polypeptide 
fragments expressed in £. coli, as shown. At bottom, a partial amino add sequence of PISO*"^*" residues 906-950 is given. The start of 
homology of P130»'»^'" with all protein-tyrosine kinases (PTK) (Hunter & Cooper. 1985; Sadowski et ai., 1986), the positions of amino 
acids 919, 920 and 925, as well as the previously predicted borders of the ATP-binding site (Weinmaster ei aL, 1986) are mdicated 



interact with the y-phosphate of bound ATP (Kamps et 
al, 1984; Weinmaster et al, 1986). Substitution of this 
residue with arginine by oligonucleotide-directed 
mutagenesis has shown that lys-950 is critical for 
P130sae fp$ kinase activity, and may participate directly in 
catalyzing phospho transfer at the active site (Weinmaster 
et aL 1986). In addition, residues 922-950 show primary 



structural similarity to the adenine nucleotide-binding 
sites of NAD-binding dehydrogenases such as glyceralde- 
hyde 3-phosphate dehydrogenase and the AMP-bindmg 
site of adenylate kinase for which the complete tertiary 
structures have been solved (Buehner eiai, 1974; Shulzf' 
a/., 1974; Weinmaster et al, 1986). The nucleotide-bind- 
ing folds of these proteins assume the conformation p 
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strand -loop -a helix- J52 strand, forming a hydrophobic 
pocket for the adenine group from which the ribose and 
phosphates protrude (Buehner et aL, 1974; Pai et al., 
1977). Given the collateral evidence that this is indeed the 
protein kinase ATP-binding site, it seems likely that 
residues 922-950 of P\30^^'^^ adopt a sunilar conform- 
ational structure (Sternberg & Taylor, 1984). In this 
case, residues 922-928 of PlSO^^^'*** would form a p 
strand, while the following highly conserved glycine-rich 
sequence (residues 928-933 of P130^»''pO would form a 
loop joining this P strand to an a helix. Deletion of w-fps 
sequences to amino acid 925 would remove most of the 
predicted pi strand of the ATP-binding site and might 
thereby abolish catalytic function by interfering with 
ATP binding. 

Since all the pTdl bacterial proteins had 42 N-terminal 
trpE amino acids, it might be argued that the loss of 
catalytic function in the pTdl925, 935 and 938 products 
resulted not from deletion of sequences required for 
catalysis but from conformational distortion of the 
catalytic domain induced by proximity of the short trpE 
sequence. However, the precise correspondence observed 
between complete loss of kinase activity and encroach- 
ment on the predicted ATP-binding site suggest that this 
assay is a reliable indicator of \-fps elements absolutely 
required for catalytic activity. 

The relatively small, but detectable, levels of kinase 
activity exhibited by the pTdl919 and 920 proteins 
suggests that N-terminal v-fps sequences can be deleted to 
the start of the ATP-binding site without preventing 
ATP- or substrate-binding or the transfer of phosphate 
from one to the other. Sequences N-terminal to residue 
919 are therefore not obligatory for binding and 
phosphorylation of substrates such as enolase and 
poly(glu,tyr). Several observations, however, suggest that 
sequences within the non-catalytic domain immediately 
N-terminal to the core catalytic region formed by PI 30^^*^ 
residues 920-1182 may interact with this region and 
influence kinase function. 

Firstly, the additional 41 residues in the pTdl878 
protein stimulate activity by about 100-fold compared 
with the pTdl919 protein. These sequences may, 
therefore, contribute to some aspect of catalytic function, 
or may stabilize the structure, of the catalytic domain. 
Secondly, in analyzing the phosphopeptides of the bac- 
terial trpE-V'fps proteins we observed that autophos-" 
phorylation at sites represented by tryptic phosphopep- 
tides 4 and 7 was lost in the pTdl878 protein but was 
present in the pTd 1835 product. The simplest explanation 
for these data is that these sites are located between 
P1308^8-o« residues 835 and 878 within the SH2 domain, 
although the results could also be interpreted as an 
indirect eflfect of the deletion. In either event this observa- 
tion implies an interaction between the SH2 domain and 
the core catalytic region. We have suggested that the SH2 
domain of PI 30^^"*^^ is involved in the recognition of 
cellular proteins in vertebrate cells (Sadowski et al^ 1986). 
The corresponding region of Rous sarcoma virus pdO""*"^ 
has been implicated in defining substrate specificity (Jove 

al,, 1986). It is possible that non-catalytic v-fps domsins 
such as the adjacent SH2 region may be required, for the 
recognition of specific targets involved in neoplastic 
transformation of vertebrate cells. 

The phosphorylation of exogenous substrates by trpE- 
y-fps bacterial proteins was rather .restricted. The 
predominant phosphoproteins of labeled bacteria and 



bacterial lysates were autophosphorylated trpE-v-fps 
proteins or exogenously added enolase, despite the 
presence of numerous bacterial proteins (Figure 5a, lanes 
1-6; Figure 5b, lanes 1-8). The only £. coli protein to be 
consistently phosphorylated was a 165-kDa polypeptide 
(Figure 2a, lanes 8 and 9; Figure 5a, lanes 8 and 9) that 
contains phosphotyrosine in induced cells (data not 
shown). In this regard, bacterial v-fps polypeptides 
behaved differently from \'abl proteins expressed in £. 
co/z, which phosphorylate numerous bacterial proteins at 
tyrosine (Wang et al, 1982; Prywes et al, 1985). 

In summary, our data suggest that the start of the y-fps 
tyrosine kinase catalytic domain is efiFectively defined by 
the ATP-bmding site and that the kinase domain can 
function as an autonomous unit. The C- terminal 263 
residues of P130^«'^p' apparently possess all the inform- 
ation required for substrate binding and phosphotransfer, 
although additional N-terminal sequences may be 
required for full catalytic activity. Previous work of 
Yaciuk & Shalloway (1986) has suggested that the C- 
terminus of the src family kinase domain is defined by a 
conserved leucine (residue 1173 of P130^« ''pO that is 
required for protein stabiUty in vertebrate cells. 

A major obstacle to the biochemical and structural 
analysis of retroviral transforming proteins such as v-fps 
is a lack of material. By the expression of relatively soluble 
v-fps polypeptides in E. coli we have acquired a conven- 
ient system for the study of v-j>^. kinase activity. The 
bacterial v-fps proteins are expressed to high levels and 
autophosphorylate at physiological sites. We expect that 
the vectors described here will enable us to thoroughly 
study specific regions of the kinase domain by combined 
mutational, biochemical and structural analysis. 



Materials and methods 

Bacteria and plasmids 

The RRl strain of E, coli (hsd20, araI4. proA2. lacYl, galk2, 
rpsL20, xyll5, mtll, supE44) was used for most transforma- 
tions (Bolivar et al, 1977). The construction of the plasmid pJ2, 
and of the mutants RX18m, RX15m and AX9m derived frona 
pJ2 have been described previously (Stone et aL, 1 984; Sadowski 
et al, 1986). FSV DNA sequences used in these constructs 
derive from a molecular clone isolated by Shibuya & Hanafusa 
(1982). 

Construction of bacterial expression plasmids 

Figure 1 summarizes the construction of the plasmids used in 
these experiments. To generate v-fps encoding fragments of 
decreasing size we cut the FSV genome at unique restriction 
enzyme sites and at unique sites created by linker insertion 
mutagenesis. Digestion of wild type FSV with Smal, PvuU/ 
Smal or BspMI/Smal generated w-fps encoding fragments 
startmg at amino acid 49, 729 and 893, respectively (Figure 1. 
panels a and b). The BspMI/Smal fragment was made blunt by 
end-fiUing with Klenow enzyme. Digestion of the FSV Xhol 
Unker insertion mutants RX18m. RX15m and AX9m with 
Xhol/Hindm generated fragments encoding v-fps from ammo 
acids 637.. 822 and 833 (panel c). Blunt ended fragments were 
cloned into the Smal site of pUC vectorsjj^e the -^^^^^ 
Hindm fragments were cloned into the SaU/Hmdm sit^ 
pUC8. Frol^hese constructs, the E^oRI/Hindm mser^ w^^^ 
subcloned into the proper reading frame P™'* V^'^iS'^^ 
or pATH-l I (Figure 1, panels a-c). A P^^^^J^^^^^^ 
— lasmids has been given elsewhere (Sadowski etai.^ 



of the pTF plasmids 
1986).. 
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To construct pTdl823, pTdi835 and pTdl878, the pTF729 
plasmid was digested with Kpnl which cleaves at the codon for 
amino acid 821 (Figure 1, bottom of panel a). Linearized DNA 
was treated with Bal31 nucliease (New England Biolabs) for 2- 
1 0 min at 30*C. Bal3 1 -digested DNA was cut with EcoRV which 
cleaves between the codons for amino acid 42 and 43 of irpE. 
The fragments were purified by electrophoresis on 0.7% 
agarose. Following end-filling with Klenow and religation, the 
plasmids were retransformed into RRl. Clones were screened 
by analysis of protein products on 10% po!yacrylamide gels. 
Those expressing novel proteins of the proper size were 
sequenced from double-stranded DNA using the oligonu- 
cleotide primer 5' GAACAAAATTAGAGAATA 3' which 
corresponds to nucleotide sequence 142-159 of the £. coli 
tryptophan operon (Yanofsky et aL, 1981). The construction of 
pTdl919. 920, 925 and 935 was as described above except that 
pTF893 was linearized with Kpnl which cuts in the polylinker 
region between the coding sequence for trpE and v-fps (Figure 1 , 
panel d). Bal3 1 reactions were for 20 or 30 min using Slow Bal3 1 
(International Biotechnologies Inc.). pTdl938 was constructed 
by digestion of pTF729 with NotI and EcoRV. The NotI 5' 
protrusion was filled in with Klenow and the vector religated. 
The pTdl359 plasmid was constructed by digesting pTF49 with 
EcoRV and Ncol, limited S, nuclease treatment and religation. 

Growth of bacteria and induction of protein expression 

Cells containing expression plasmids were grown overnight at 
37*C in 1ml M9 medium supplemented with 0.5% w/vol 
casamino acids, 10 (ig/ml thiamine, 20 yigind tryptophan and 
50 fig/nil ampiciUin. The pvemight culture was diluted 1/10 into 
1 ml fresh M9 supplemented with casamino acids, thiamine and 
ampicillin and grown for 1 h with strong aeration at 37''C. 5 ^il of 
1 mg/ml indole acrylic acid (Sigma) were added in ethanol to 
induce the tryptophan operon. The cells were grown for a 
further 2 h and harvested by centrifugation. 

To analyze whole cell protein, the cells were resuspended in 
25fil SDS-urea buffer (10 mM sodium phosphate pH 7.2, 1% 
vol/vol 2-mercaptoethanol, 1% w/vol SDS, 6 m urea). The 
lysates were incubated at 37"'C for 30 min and 25 fil of 2 x SDS 
buffer (20% vol/vol glycerol, 10% vol/vol 2-mercaptoethanol, 
4.6% w/vql SDS, 0. 1 25 M Tris-HCI pH 6.8) was then added. The 
samples were heated at lOCC for 3 min and IS-yX aliquots were 
analyzed by electrophoresis on a 7.5% SDS-polyacrylamide gel 
as described (Weinmaster et aL, 1983). 

Metabolic labeling of bacterial and mammalicui cells 

Cells containing plasmid vectors were grown as described 
above. Following 2 h induction with indole acrylic acid, the cells 
were harvested by centrifugation and washed twice in 50 mM 
Tris-HCI pH 7.5, 10 mM MgQj, 150 mM NaCl and resuspended 
in 50ptl of the same buffer. The suspensions were incubated with 
60nC\ ^^Pj for 20 min at room temperature. The cells were then 
washed 3 x in TEN buffer (50mM Tris-HCI pH 7.5, 0.5 mM 
EDTA, 0.3 M NaCl) and resuspended in 25 yd SDS-urea buffer. 
The whole cell lysates were analyzed by electrophoresis through 
7.5% SDS-polyacrylamide gels. 

Growth of FSV-transformed rat-2 cells and metabolic label- 
ing with "Pi were as described (Weinmaster et aL, 1983, 1984). 
Pl30j«»-rp» immunoprecipitated from radiolabeled cells with 
R254E anti-pl9«^ monoclonal antibody (Ingman-Baker et al, 
1984) and purified by electrophoresis through 7.5% SDS- 
polyacrylanMde gels. 

Protein analysis 

Electrophoretic separation of proteins on SDS-polyacrylamide 
gels, isolation of radiolabeled proteins from gels, and sub- 
sequent phosphoamino acid analysis and tryptic phosphopep- 



tide mapping were carried out exactly as described previously 
(Weinmaster et aL, 1983. 1984). 

Preparation and assay of bacterial lysates for kinase activity 

Bacteria were grown and induced as described above. The 
pellets from 1-ml cultures were washed once with TEN buffer 
and resuspended in 10 y] E, co// lysis buffer (50 mM Tris-HCI pH 
7.5, 0.3 M NaCl, 1% vol/vol NP40, 20 mM MgClJ. 10 /xl £. coli 
lysis buffer plus 2 mg/ml lysozyme were added and the suspen- 
sion was incubated on ice for 30 min. 10 fxl of the same buffer 
plus 0.3 mg/ml DNAasel was then added and the suspension 
held on ice for a further 60 min. For assay of enolase phos- 
phorylation, lOfxl of the lysate were mixed with S/ig acid- 
denatured rabbit muscle enolase and kinase reaction buffer 
(100 mM Hepes pH 7.5. 10 mM MnClJ. 5\lC\ [7-''P]ATP were 
added in reaction buffer to bring the total volume to 30 The 
reactions were incubated at 37''C for 15 min and stopped by the 
addition of 30 fil 2 x SDS sample buffer. The samples were 
analyzed by electrophoresis through 10% polyacrylamide gels 
and autoradiography. For quantitation oUrpE-v-fps protein in 
each reaction, the Coomassie blue-stained gels were den- 
sitometrically scaimed and the integrals from the protein peaks 
used to calculate the amount of protein per band. 

Phosphorylation of poly(glu,tyr) (80:20) polymer was 
assayed using crude lysates as described above, lOfil crude 
lysate were mixed with 10 /ig poly(glu,tyr) (Sigma) in kinase 
reaction buffer. 2 \iC\ [v-^^P]ATP were added in reaction buffer 
to bring the volume to 30 mI* After 1 5 min at 37"C the reaction 
was stopped by the addition of 1 yX lOOmM cold ATP, IpX of 
500 mM EDTA and 40 fil 2 x sample buffer (4% vol/vol NP40, 
10% vol/vol glycerol, 125 mM Tris-HCI pH 6.8, 0.002%' bromo- 
phenol blue). The reactions were electrophoresed on 5% 
polyacrylamide gels containing 8 M urea, 2% NP40, 0.325 M 
Tris-HQ pH 8.8 (Schieven et aL, 1986). The gels were fixed and 
washed extensively in 10% acetic acid, .50 mM sodium pyro- 
phosphate. Gels were dried and autoradiographed prior to excising 
the poly{glu,tyr) bands and quantitation of ^-P counts. To measure 
the amount of trpE-v-fps protein in the poly(glu,tyr) reactions, 
lOfil of the bacterial lysate were run on a 10% SDS- 
polyacrylamide gel, and the Coomassie blue-stained gel den- 
sitometrically scanned. 

Quantitation of solubilities of trpE-v-(ps fusion proteins 

E. coli were grown and induced as described above, except that 
40/iCi ["S]methionine was added at the time of induction. The 
cells were lysed as above and the lysates centrifuged at 1 2 000 g 
for 20 min. The soluble and insoluble fractions were run on 10% 
polyacrylamide gels. The amount of protein in each fraction was 
determined by excising the appropriate bands and counting the 
incorporated ["S]methionine. Counts from background bands 
were subtracted in calculation of protein levels. The identities of 
the trpE-v-fps proteins were verified immunologically by 
immunoprecipitation of p^S]methionine-labeled clarified lysates 
with 4 ill antl'fps rat antiserum. 
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Crystal structure of the tyrosine kinase 
domain of tlie liuman Insulin receptor 
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The X-ray crystal structure of the tyrosine kinase domain of the human Insulin receotor has 
been determined by multiwavelength anomalous diffraction phasing and refined to 2.1 A resolu- 
tion. The structure reveals the determinants of substrate preference for tyrosine rather than 
serine or threonine and a novel autoinhlbitlon mechanism whereby one of the tyrosines that 
is autophosphorylated In response to Insulin, Tyr 1,162, is bound in the active site. 



Insulin activates a number of signalling pathways that regulate 
cellular metabolism and growth (reviewed in ref. 1). The insulin 
receptor is ^ transmembrane glycoprotein^^ and a member of 
the receptor tyrosine kinase family which includes, among 
others, the receptors for epidermal and platelet-derived growth 
factors (EGF and PDGF). The extracellular portion of receptors 
in this family contains the binding site for its particular protein 
ligand and the tyrosine kinase activity resides in the cytoplasmic 
portion. In contrast to the EGF and PDGF' receptors, which 
are monomeric and dimerize upon ligand bindmg (reviewed in 
ref 4), the insulin receptor is a disulphide-Iinked 02^2 hetero- 
tetramer. Binding of insulin to the extracellular a -chains is 
thought to cause a change within the quaternary structure of 
the receptor that results in autophosphorylation of specific tyro- 
sines in the cytoplasmic portion of the j5-chains: two in the 
juxtamembrane region (^^30 residues C-terminal to the trans- 
membrane helix), three in the tyrosine kinase domain 
('^300 residues), and two in the C-terminal tail (^^70 residues)^"*. 
The nature of the autophosphorylation events, whether cis 
(within a jS-chain) or trans (between j3-chains), has been the 
subject of debate'"'^. 

Autophosphorylated tyrosines of the EGF and PDGF»recep- 
tors serve as binding sites for proteins that contain Src-hom- 
oIogy-2 (SH2) domains'^, whereas the phosphotyrosines of an 
insulin-receptor substrate, IRS-1, rather than those of the recep- 
tor itself, are the predominant targets for SH2-containing 
proteins***. Although the roles of the phosphotyrosines in the 
juxtamembrane region and the C-terminal tail have not been 
fully elucidated, it is well established that autophosphorylation 
of the three tyrosines in the kinase domain stimulates kinase 
activity towards exogenous substrates'. The tyrosine kinase 
activity of the insulin receptor has been shown through kinase- 
inactivating point mutations to be essential for insulin signal 
transduction^^'*. A number of nonsense and missense mutations 
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in the tyrosine kinase region of the gene encoding the insulin 
receptor have been identified in patients afflicted with non-insu- 
lin-dependent diabetes mellitus (NIDDM) (reviewed in ref 17). 

We have used the multiwavelength anomalous diffraction 
(MAD) phasing method'* to determine, the crystal structure of 
the unphosphorylated, apo form of the tyrosine kinase domain 
of the insulin receptor (IRK). The structures of several protein 
serine/threonine kinases (PSKs) have been reported: cyc^o- 
AMP-dependent protein kinase (cAPK)'', cyclin-dependent km- 
ase 2 (CDK2)^**, mitogen-activated protein kinase (MAPK) , 
and twitchin kinase^^. The IRK structure reveals those features 
that are characteristic of members of the protein tyrosine kinase 
(PTK) family. 

Structure determination 

A baculovirus/insect cell expression system was used to produce 
a 306-residue fragment of the j3-chain of the human insulin 
receptor which contains tyrosine kinase activity and the auto- 
phosphorylation sites Tyr 1 ,1 58, Tyr 1 ,1 62 and Tyr IJ 63 (L.W. 
S.R.H., W.A.H. and L.E., manuscript submitted; numbering.is 
according.to ref. 3). The N and C termini of the expressed pio- 
tein were chosen on the basis of proteolytic studies perform^ 
on the entire cytoplasmic portion (Af, 48K) of the j3-chain. ToC 
purified protein was not tyrosine-phosphofylated as judged by 
western blotting with anti-phosphotyrosine antibody and auto- 
phosphorylation experiments demonstrating phosphorylation ai 
three sites upon addition of Mg-ATP. 

Attempts to solve the structure of the unphosphorylated, ip9 ' 
form of IRK by molecular replacement using the cAPK stnictitf* j 
as a seardi model were unsuccessful. The structure was solvco .^ 
by the MAD phasing method using a crystal derivatized wiw 
ethylmercuric phosphate, which binds to two cysteine thiols. 
Synchrotron diffraction data were collected at three X-ray 
lengths near the mercury Lm absorption edge. The initial MM^ 
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I RG. 1 a, The experimental, MAD-phased electron density map at 2.5 A 
resolution in the region of the IRK active site, contoured at Icr. Super- 
• posed is the refined .atomic model, b, The corresponding 2Fo-Fc 



derived phases to 2.5 A Bragg spacings were subsequently 
improved through mxiltiple.cycles of mode! building, refinement 
\ and partial model phase combination. The structure has been 
refined to 2.1 A resolution with a crystallographic /{-value of 
I 19.6%. The atomic model includes all but the three N-terminal 
l' residues. Electron density maps computed with the MAD- 
I derived phases and with the phases calculated from the atomic 
model are shown in Fig. 1. Details of crystallization, data collec- 
tion and analysis, model building and structure refinement are 
i given in Table 1 . 

Overall topolo^ 

IRK is composed of two lobes with a single connection between 
them, similar to the kinase cores of the PSKs whose structures 
have been determined (Fig. 2a). The N-terminal lobe com- 
prises a twisted ^-sheet of five antiparallel ^-strands (Pl-fiS) 
and one a-helix (aC), The secondary structure nomenclature 
follows that for cAPK'*; a structure-based sequence alignment 
is shown in Fig. 3. IRK lacks the short a-helix B which immedi- 
ately precedes c-helix C in cAPK*'. as do the other PSKs with 

: known structure. The j3-sheet cores of IRK and cAPK are very 
similar, with a root-mean-sqtiare (r.m.s.) deviation of only 0.8 A 
for a superposition of 30 Ca atoms in the five ^-strands. Differ- 

I ences in the N-terminal lobe are found in the loops between J3- 
strands, including a five-residue insertion between P2 and P2 
in IRK, and near the N terminus of aC (Fig. 2c). Although 
the first eight residues in the IRK structure (981-988) are 
relatively mobile (main-chain values >45 A^),. the backbone 
conformation is extended rather than a-helical; the latter was 
predicted for PTKs possessing the semiconserved Trp(989)- 
Glu sequence^^. 

The larger C-terminal lobe comprises eight a-helioes (aD, aE, 
«EF, aF-aJ) and four /5-strands (pi, /38. /510, j^l 1). The one- 
^d-a-half-turn aEF is only a one-turn helix in cAPK and was 
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electron density map at 2.1 A resolution, contoured at la. Figures pre- 
pared wrth SETOR^\ 



not identified as such in the cAPK structure report. IRK lacks 
^-strands 6 and 9 of cAPK, although the IRK residues that 
correspond to ^-strand 6 follow a similar course. A superposi- 
tion of 66 Ca atoms in helices aD-aH of IRK and cAPK gives 
an r.m.s. deviation of 1.3 A. In the C-terminal lobe, IRK and 
cAPK are most dissimilar in the activation loop, between ^8 
and aEF. and in the connection between aD and aE. The activa- 
tion loop contains the three tyrosine autophosphorylation sites 
in IRK and the essential phosphorylation site Thr 197 in cAPK. 
In the apo IRK structure, the activation loop (five residues 
longer than in cAPK) contains the two short )3-strands PIO and 
^1 1 and traverses the cleft between the N- and C-terminal lobes 
(Fig. 2d). The paths of the apo IRK and cAPK activation loops 
diverge at Gly 1,149 (Thr 183) and reconverge at Pro 1,172 
(Thr 201). Residues 1.152-1,157 in the IRK activation loop are 
relatively mobile, with main-chain B-values exceeding 45 A^. The 
loop connecting aD and aE, known as the kinase insert region 
('^lOO-residuc insertion in the PDGF receptor), is ten residues 
longer in IRK than in cAPK and contains the sequence 
Pro(l,099)-Gly-Arg-Pro-Pro-Pro, which is reminiscent of pro- 
line-rich sequences to which Src-homology-3 (SH3) domains 
bind^'*. The three consecutive prolines in the IRK structure adopt 
a left-handed polyproline type II helical conformation, as do 
the proline-rich peptides in the SH3-peptide complexes whose 
structures have been determined""^^. The proUne-rich sequence 
in IRK is shorter by one helical turn than the canonical SH3- 
binding sequence; as yet, no IRK-SH3 interaction has been 
reported. 

Orientation of N- and C-termlnal lobes 

The major difference in the global structure of IRK and cAPK 
is the relative orientation of the N- and C-terminal lobes. Crys- 
tallographic studies of mammalian cAPK with and without Mg- 
ATP and a peptide inhibitor (PKI) have revealed two conforma- 
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tional states for this kinase, referred to as the open and closed 
forms^' (the closed form being active). In the open form, seen 
in the apo and binary (+PKI) structures, the N-terminal lobe is 
swung away from the C- terminal lobe by 14"" and translated by 
0.7 A compared with the lobes in the closed, ternary (+PKI, 



-l-Mg-ATP) structure. The N-terminal lobe of IRK is rotated 
26** and translated 0.8 A relative to closed-form cAPK and 19" 
and 2.1 A relative to open-form cAPK. In addition to a rota- 
tional component that results in a wider cleft between the two ■ 
lobes (axis perpendicular to the page in Fig. Id), there is also a 



TABLE 1 Statistics for data collection, phase determination and refinement 



Data collection (20.0 to 2.5 A) 

Wavelength 
(A) 

0.9793 (remote) 
1.0061 (peak) 
1.0093 (edge) 

MAD structure factor ratios§ 



Reflections* 
(N) 

22.529 

22,461 (37.359)t 
22,346 



Completeness 
(%) 

95.7 

95.4 (93.9) 
94.9 



Signal 
<//<T(0> 
22.4 

21.3 (20.7) 
20.9 



(%) 

2.6 

2.5 (2,7) 
2.5 ^ 



Observed ratios 



Wavelength 
(A) 




20.0<d<4.0A 






4.0<rf<2.5A 




0.9793 


1.0061 


1.0093 


0.9793 


1.0061 


1.0093 


0.9793 


0.035 


0.019 


0.034 


0.037 


0.023* 


0.037 




(0.017) 






(0.024) 






1.GG61 




0.054 


0.027 




. 0.057 


0.032 






(0.017) 






(0.023) 




1.0093 






0.050 






0.057 








(0.020) 






(0.031) 



\ 



r 



Scattering factors (e) 
f" 

6.0 
10.1 
9.7 



MAD phaslngll 



R(|°Ft|) = 0.033 
R(|°FJ) = 0.315 



<A(A<^)>-32.8*' 
<a(A^)> = 14.6*^ 



'-21.6 
-14.4 
-7.7 

<m> = 0.87 



Refinementf 

Model: 303 residues, 201 water molecules, 2 EMP nriolecules (2,588 atoms) 

Reflections 
(AO 

34,747 



d-spacings 
(A) 

6.0-2.1 



R-value 
(%) 

19.6 



Free R-value 
(%) 
23.2 



R.m.s. deviations 



Bonds (A) 
0.011 



Angles n 
1.6 



B-values (A^) 
2.2 



Ext>re8Slon and crystallization. A recombinant baculovirus was constructed to code for human insulin receptor residues Vai978 to Lys 1.283 
(LW., S.R.H,, W.A.H. and LE., submitted). Two amino-acid substitutions were introduced: Cys.981-*Ser and Tyr 984-tPhe. The expressed protein 
was purified from lysates of baculovlrus-lnfected insect cells (48-h post-infection) on Q-Sepharose, Superdex-200 and Mono-Q columns. Crystals 
of apo IRK were grown at 21 '0 by vapour diffusion in hanging drops containing equal volumes of 10 mg mP^ protein solution and the resen^olr 
solution of 20% polyethylene glycol (PEG) 6000, 0.2 M malate-imidazole, pH 7.5. Macroseeding was required to grow crystals of sufficient size. The 
crystals belong to space group P2i2i2i and have unit cell dimensions of a = 54.0 A, b= 73.0 A, c===89.2 A when frozen. There is one IRK molecule 
in the asymmetric unit and the solvent content is 52%, assuming a partial specific volume of 0.73 cm^ g"^. Data collection. Ail of the MAD data 
were obtained from one cryocooled mercury-derivative crystal with approximate dimensions of 0.6x0.6 x 0.07 mm. The crystal was soaked in 
stabilizing solution (25% PEG 6000, 0.2 M malate-imidazole, pH 7.5) which included 0.1 mM ethylmercuric phosphate (EMP) for '-40 h, after which 
it was transferred to stabilizing solutions that successively included 5% and then 10% glycerol. The crystal was flash-cooled in a dry nitrogen 
stream at -160 "C. Data were collected on Fuji imaging plates (IPs) at three X-ray wavelengths near the mercury Ui edge at beamline X-4A at the 
National Synchrotron Light Source, Brool<haven National Laboratory. The crystal. was oriented such that a* was parallel to the oscillation axis. 
Bijvoet pairs across tlie b*c* plane were collected on the same or adjacent IPs. Oscillation ranges of 1.7-1.9" and exposure times of 90-120s 
were used. The exposed IPs were digitized with a Fuji scanner. Data processing. Raw IP data were converted to integrated intensities with DEI^O*^. 
ROTAVATA" was used to calculate scaling parameters for each IP, and these were applied with a modified version of AGROVATA^^ that does nc(t 
merge redundant measurements. The IVIADSYS program package (WJ\.H.) was used to extract phases and figures of merit The experimentti 
electron density map was computed using MAD-derived phases for reflections from 20.0 A to 2.5 A. Model building and refinement. Fitting of the 
polypeptide chain to the electron density was done using FRODO^ on a Silicon Graphics Iris wori<station. Reference was made to the cAPK 
structure*' during model building. COMBIN (WJ\.H.) was used to combine the experimental phases with phases calculated from the partial model. 
Least-squares and simulated annealing refinement were done using X-PLOR^*, gradually extending the resolution from 2.5-2:1 A. When the mooei 
was neariy complete, simulated annealing omit maps were computed In which ten consecutive residues were omitted, from the calculation: fis 
defined In PROCHECK*^, there are no residues in disallowed main-chain torsion angle regions and two residues, Arg 1,131 and Asp 1,156, fr» 
generously allowed regions. Side chains for the following residues were not modelled past owing to poor electron density. 981, 987, 1,034, 
1,047, 1,096, 1,127, 1,153-1.157, 1,159 and 1,237. The average e-value for protein atoms is 23.2 A^ The side chains for Met 1,051 ana 
Met 1,076 have been modelled in two different conformations. Water molecules whose B-values refined to >50 A^ were omitted from the subsequent 
round of refinement. EMP Is bound to Cys 1,056 and CJys 1,234. 

* Unique reflections for acentrics In point group symmetry P222 and Gentries in Pmmm. 

t R^ym = 100 X ZtM I/I '( - </> l/2ftw , where /f is the /th measurement and </> Is the weighted mean of all measurements of /. . 

i Values in parentheses are for data from 20.0-2.1 A used for refinement. . 

§ Table values represent <AtFl^>^7<|Fl^>^, where Al^l is the absolute value of the Bijvdet difference at one wavelength (diagonal element; 
or of the dispersive difference at two wavelengths (off-diagonal elements). The values in parentheses are the ratios for centric Bijvoet reflections, 
which would be equal to zero for perfect data and serve as an estimate of the noise In the anomalous signals. The listed scattering factors are 
the refined values for mercury. . 

II W=Ihw Ilfl ^4 - <|F|>|/Sftw|F|. °Ft Is the structure factor due to normal scattering from all the atoms. °Fa is the structure factor due to normal 
scattering from the anomalous scatterers only, and A<t> is the phase difference between % and °Fa . A(A0) Is the difference between two 'ndepenoent 
determinations of A^. Values given are based on calculations that did not include reflections with m = 0; <m> is the mean figure of merit InciuoinB 
reflections with m=0. . • 

^ A subset of the data (10%) was excluded from the refinement and used for the free /?-vaiue calculation until the final round of refinement, 
which ail of the data (F>2a) were used. R-value=100x5;ft„|iFo|-|Fe||/Iwa|Fo|. R.m.s. deviation in S-values Is for bonded atoms. 
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rotational component along the long axis of the molecule (vert- The N- and C-terminal lobes of IRK are held apart by steric 

ical in Fig. 2d, anticlpckwise from above upon opening). The interactions between conserved Gly 1,005 of the glycine-rich. 

apo forms of MAPK and twitchin kinase also adopt open con- nucleotide-binding loop (between ^1 and fi2) and residues 

fonnations with lobe rotations of 17° and 30" (refs 21 and 22. Phe 1,151 and Gly 1,152 of the kinase-conserved Asp-Phe-GIy 

respectively), vis-d-vis closed-form cAPK, whereas apo CDK2 sequence, near the beginning of the activation loop (Figs 2b, 

adopts a closed conformation^. 5b), In contrast, the interaction of residues in aC with Asp- 



FIG. 2 a, Ribbon diagram of the apo IRK structure. The a-helices are 
shown in red, the ^-strands in green, the side chains of tyrosines 1,158. 
1,162 and 1.163 in yellow, the giycine-rich. nucleotide-binding loop In 
orange, the catalytic loop in dark blue, the activation loop in violet the 
P-*-l loop in light blue, and the kinase insert region in dark grey. The 
termini are denoted tiy N and C. b, Stereo view of a Ca trace of IRK in 
the same orientation as In a. Every tenth residue is marked with a filled 
circle and every twentieth residue is labelled, c. Ca trace of the N- 
termlnal lobes of IRK (residues 993-1,081) and cAPK (residues 40- 
125) in which the Ca atoms of the ^-sheet cores are superposed. IRK 
is in red, cAPK in blue. The view is rotated slightly from that in a. tf. 
Stereo view of a Ca Uace of IRK (residues 981-1,283) and cAPK (rest- 
dues 40-310) in which the.Ca atoms of cD-aH are superposed; colour- 
ing as in c. The activation loop In IRK (residues 1.149-1,170) and the 
corresponding residues In cAPK (183-199) are shown In orange and 
green, respectively. The view Is the .same as In b. Brookhaven Protein 
Data Bank entry lATP for cAPK'*^ was used in the Co superpositions. 
Preparation of figures: a made with RIBBONS*^; b with MOLSCRIPt**; 
c and d with GRASP*^ 
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FIG. 3 Staicture-based sequence alignment of IRK, cAPK, and the tyro- 
sine kinase domains of the EGF, PDGF [p) and FGF (FLG) receptors and 
of c-Src and c-Abl. The secondary structure assignments for iRK and 
cAPK were obtained using the Kabsch and Sander algorithm*® as imple- 
mented in PROCHECK*° A residue is highlighted if found at that position 
In >90% of the tyrosine kinase sequences listed In Hanks*^ Of these, 

Phe-Gly accounts for the open conformation of MAPK^^ The 
disposition of the lobes in apo IRK places kinase-conserved 
Lys 1,030, a residue in the N-terminal lobe implicated in ATP 
binding, a comparatively large distance from the catalytic resi- 
dues in the C-terminal lobe. In IRK the distance from Lys 1,030 
to Asp 1,132. the catalytic base, is 13.7 A (N(-052), compared 
with 12.4 A in MAPK^', 1 1.2 A in open-form cAPK, and 7.8 A 
in closed-form cAPK. 

Active site 

The roles of the highly conserved residues in the protein kinase 
family have been largely elucidated from the crystallographic 
studies of cAPK oomplexed with Mg-ATP and PKl"*'' 
(reviewed in refs 30 and 31). The so-called catalytic loop lies 
between ^-strands 6 and 7 in cAPK. In this loop, Asp 166 
(Asp 1,132) and Asn 171 (Asn 1,137) are nearly invariant in 
both the PSK and PTK families, and Lys 168 (Ala 1,134) is 
highly conserved in the PSK family. Asp 166 is the catalytic 
base in the phosphotransfer reaction, Lys 168 provides charge 
neutralization, and Asn 171 is hydrogen-bonded to Asp 1660 
and is involved in Mg^*^ coordination. In the PTK family, an 
arginine is present in the catalytic loop rather than a lysine, 
either two (Src subfamily) or four residues (all other PTKs; 
Arg 1,136 in IRK) from the catalytic base (Fig. 3). 

750 



the residues th.at show consen/ation In the PSK family as well are 
shaded, whereas those that are PTK-speclfic are shown in reverse con- 
trast. The tyrosine autophosphorylation sites in IRK and the. essential 
phosphorylation site In cAPK are marked with an asterisk. No attempt 
was made to align the PTK sequences between aD and aE and between 
P2 and ^3, 

One of the most striking features of the apo IRK structure 
is the presence of Tyr 1,162, a tyrosine that is autophosphoryl- 
ated upon insulin binding, in the active site of the enzyme, 
seemingly poised for cw-autophosphorylation (Fig. 4a, b). The 
hydroxy] group of the Tyr 1,162 phenolic ring is hydrogen- 
bonded to the carboxylate group of the catalytic base, 
Asp 1,132 (Q7;-0^2: 2.6 A), and to the guanidinium group 
of Arg 1,136 (Q7;-Ne: 2.9 A). The phenolic ring is orient«^ 
in part by the pyrrolidine ring of PTK-conserved Pro 1,172. 
The distance between the centroid of the pyrrolidine ring and 
the edge of the phenolic ring is 3.6 A. Arginine 1,136 makes 
several other contacts: salt bridges to the carboxylate grou^ 
of Asp 1,132 and Asp 1,161, and an axial polar interaction 
with the indole ring of Trp 1,175 (Nrjl-centroid of six-mem- 
bered ring: 3.1 A). 

The catalytic loop conformations of IRK and cAPK are very 
similar despite several sequence differences (Fig. 4b). A sup^" 
position of the catalytic loops reveals that Arg l,136Nfr is the 
counterpart to Lys 168Nf in cAPK. The arginine in the catalytic 
loop of PTKs appears to serve a dual purpose: it provides charge 
neutralization at the phosphotransfer site (like Lys 168 ^ 
cAPK), and makes a hydrogen bond (via Nr]2) with 051 of the 
catalytic base, which in cAPK is hydrogen-bonded to 
Thr201Oyl. Either a threonine or serine is found in the PSK 
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FIG. 4 a. Stereo view of the active site of IRK. Highlighted side-chain 
and main-chain atoms are colour-coded by atom type: carbon, yellow; 
nitrogen, light blue; oxygen, red. All other atoms are shown in dark 
green. Hydrogen-bonding interactions are shown by white lines. The 
view for a and b is approximately perpendicular to that In Rg, 2a, looking 
along the long axis of the molecule from the N- to the C-terminal lobe. 
b, Stereo view of the active sites of apo IRK and ternary cAPK in which 
the catalytic loops are superposed. IRK is in red, cAPK in blue, ATP and 
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PKI from ternary cAPK in orange and green, respectively. Alanine 21 of 
PKI was changed to serine with the side-chain dihedral angle set to 
-60*. The labelled IRK residues and the corresponding cAPK residues 
(in parentheses) are Asp 1.132- (Asp 166), Ala 1,134 (Lysl68), 
Arg 1,136 (GlulTO). Asn 1,137 (Asn 171), Phe 1,151 (Phel85), 
Met 1,153, Tyr 1,158, Tyr 1,162. Pro 1,172 CThr201) and Trp 1.175 
(Tyr204). Because of side-chain disorder, only Cfi of Met 1,153 is 
shown. Rgures drawn with GRASP*'. 





5 a, Missense mutations in NIDDM patients napped onto the IRK 
structure. The mutations, whose positions are shown in red, are 
Arg993-Gln, Gly l,008-»Val. Ala 1,048-+Asp, Lys 1,068-+Glu, 
Argl,131-fGln, Ala 1.134 -^Thr. Ala 1.135 -*Glu. Metl.l53-^lle, 
Argl.l64-»Gln, Arg 1.174-*Gln, Pro 1,178-^Leu, Trp 1,193-^Leu and 
'HP l,2(X)-f Ser. b. Mapping of the highly conserved residues In the PTK 
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family. In green are residues that show conservation in the PSK family 
as well, and in red are resWues that are PTK-specific (as in Rg. 3). PTK- 
spectfic residues are labelled. The positions of conserved glycines are 
indicated by cotouring (green or red) of the backbone representation. 
Rgures drawn with GRASP*!*. 
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family at cAPK position 201, whereas a proline (Pro 1.172) is 
found in the PTK family. 

Although the hydroxyl group of Tyr 1,162 is in position for 
phosphotransfer with respect to the catalytic loop residues, the 
ATP-binding site appears to be inaccessible, precluding cis-auto- 
phosphorylation of Tyr 1,162. Whai the catalytic loops of apo 
IRK and ternary cAPK are superposed, Gly 1,152 and 
Met 1,153 of the IRK activation loop intersect with (superposed) 
ATP near the a- and ^-phosphates (Fig. 46). Furthermore, the 
side chain of Phe 1,151 occupies the hydrophobic pocket in 
which the adenine ring of ATP sits in ternary cAPK. Val 57 and 
Leu 173 are situated on either side of the adenine ring in 
ternary cAPK, and in apo IRK the corresponding residues, 
Val 1,010 and Met 1,139 (most often a leucine in the PTK 
family), flank the phenolic ring of Phe 1,151. In both the open 
and closed forms of cAPK, the phenolic ring of Phe 185 
(Phe 1,151) resides in a hydrophobic environment formed by 
Leu 95 (Met 1,051) and Tyr 164 (His 1,130). The corresponding 
pocket in the apo IRK structure is largely unoccupied, but is 
presumably filled by Phe 1,151 in the phosphorylated, activated 
form of IRK. 

Tyrosine substrate selectivity 

The polypeptide chain in the near vicinity of Tyr 1,162 appears 
to mimic the way in which a peptide substrate binds; Tyr 1,162 
is an autophosphorylation site (P-site), albeit in trans (discussed 
below), and the interactions of Asp 1,161 (P-1 residue) and 
Tyr 1,163 (P+ 1 residue) are consistent with observed substrate 
specificities for IRK (see below). Therefore, the detenninants 
of tyrosine versus serine/threonine substrate selectivity can be 
addressed from this apo structure. As seen in the superposition 
in Fig. 4b, the hydroxyl groups of Tyr 1,162 and Ser21 of 
PKI (serine sjubstituted for alanine at the P-site) occupy nearly 
the same position. Clearly, the hydroxyl group of a serine or 
threonine side chain at this position in IRK (and in a peptide 
substrate) would be too short to reach the phosphotransfer 
site. . 

The position of the main chain at Tyr 1,162 is detennined by 
the loop that contains residues Leu 1,171 to Ala 1,177, referred 
to as the P + 1 loop (in cAPK, the loop with which the P + 1 
residue of PKI interacts^'). Ahgnments of protein kinase 
sequences have revealed that the sequence corresponding to 
Pro 1,172 to Trp 1,175 in IRK is characteristic of the PTK 
family^^. The two main-chain interactions that govern the dis- 
tance from the Ca atom of Tyr 1,162 to the phosphotransfer 
site are hydrogen bonds between Tyr 1,1 63N and Leu 1,1710 
and between Tyr 1,1630 and Leu 1,171N (Fig. 4a). Similar 
main-chain interactions are present between the P + 1 residue of 
PKI and Gly 200 (Leu 1,171) in ternary cAPK. The active site 
superposition in Fig. 4b shows that the conformation of the 
. P+ 1 loop, especially near Pro 1,172 (Thr 201), is an important 
determinant in substrate selectivity. 

The conformation of the IRK P + 1 loop is stabilized by a 
number of interactions. Leu 1,171, Vall,173 and Met 1 , 1 76 
form a hydrophobic cluster together with Leu 1,181 and 
Leu 1,219. Arg 1,174Nt;1 and Nt;2 are hydrogen-bonded to 
Pro 1,2090 and Leu 1,21 30 in the connection between aF 
and aG. An arginine or a lysine is found at position 1,174 
in all but the Tyk/Jak subfamily of PTKs. Conserved 
Trp 1,175 plays a fundamental role in coupling the P + 1 loop 
to the catalytic loop. In addition to. the direct, axial polar 
interaction with Arg 1,136, there is an indirect interaction 
mediated by Glu 1,201; Glu l,201Oe2 is hydrogen-bonded to 
both Trpl,175Nfl and Ala 1,135N (Fig. 4a), Glu 1,201 is 
highly conserved in the PTK family, less so in the PSK family. 
The primary coupling between the P + 1 and catalytic loops of 
cAPK is the interaction of Thr 201 with Asp 166 and Lys 168, 
which.positions the P + 1 loop nearer to the catalytic loop than in 
IRK, contributing to serine/threonine versus tyrosine substrate 
selection. 
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Peptide substrate speclficfty 

The interactions of Asp 1,161 (P-1 residue) and Tyr 1,163 (P+i rt 
residue) with other IRK residues correlates well with observed 1 
IRK substrate preferences. Asp 1,161 is salt-bridged to Lys 1,085 
and Arg 1,136 and hydrogen-bonded to Gin 1,208 (Fig. 'aq) ^ 
Lys 1,085 and two other positively charged residues, Arg 1,089 fl 
and Lys 1,092, air lie along the same face of aD. The latter two ol 
residues are potential contacts for negatively charged P-2 and h 
P-3 residues of a peptide substrate. The side chain of Tyr 1,163 (1 
(also an autophosphorylation site) resides in a hydrophobic a 
environment formed by Val 1,173 and Leu 1,219, and the hyd- (. 
roxyl group is hydrogen- bonded to the carboxylate group of 
Glu 1,216. IRS-1 is phosphorylated on at least eight tyrosines s 
by the insulin receptor***. Four of these tyrosines are preceded 1 
by an aspartate or glutamate and all eight are followed by a ( 
hydrophobic residue. Furthermiore, in a study of PTK substrate ( 
specificities using an in vitro combinatorial approach, the pre- 1 
ferred peptide substrate for IRK contained the sequence Glu- I 
Glu-Glu-Tyr-Met-Met-Met (Z. Songyang and L, Canilcy. 1 
unpublished results). 

IRK mutations In NIDDM \ 

The missense mutations in the tyrosine kinase portion of the i 
insulin receptor gene that have been found in patients with 
NIDDM'^ are mapped onto the IRK structure in Fig. 5a. Many 
of these mutations involve replacement of hydrophobic residues 
with others of different size or with charged residues. At the 
highly conserved Gly 1,008 position, no room for a valine side- 
chain exists owing to the proximity of the main chain of residues 
1,031-1,033. In aC, Ala 1,048C^ sits in a hydrophobic pocket 
which could not accommodate the larger aspartate side chain. 
Similarly, Ala 1,134-^ Thr and Ala 1,135-^Glu in the catalytic 
loop and Pro 1.178-^Leu at the start of aEF would result in 
steric clashes. The two tryptophan mutations, Trp l,193-+Leu ^ \ 
and Trp 1,200 -*Ser, apparently destabilize the hydrophobic 
packing in the C-terminal lobe by introducing side chains of 
smaller size. These two tryptophans, along with PTK-conserved 
tryptophans 1,175 and 1,246, form a cluster of four tryptophans 
in this region of the C-terminal lobe. 

Four Arg Gin mutations have been identified. Both Nt^I and 
Nr?2 of the Arg 993 guanidinium group are hydrogen-bonded to 
the carbonyl oxygen of Pro 1,071 in the loop between j84 and 
fiSy thereby stabilizing that region of the N-terminal lobe. 
Pro 1,071 is a cis proline, the only one in the structure, and is 
reasonably well conserved in the PTK family. In the PTK 
sequences examined, there is a strong correlation between pro- \ 
line at position 1,071 and arginine at position 993 (Fig. 3). The 
guanidinium group of Arg 1,131 makes no electrostatic contacts 
in the present structure, yet this residue is invariant in the PTK 
family. Based on the cAPK structure, it is expected that in the - 
phosphorylated form of IRK, Arg 1,131 will be salt-bridged to 
one of the phosphotyrosines in the activation loop (see below). 
As already disciissed, the guanidinium group of Arg 1 ,174 in the 
P+ 1 loop is hydrogen-bonded to Pro 1.209O and Leu 1,2130. 
Loss of the latter interaction upon substitution with a glutamine 
probably destabilizes the conformation of the P + 1 loop, whidi 
is important in substirate positioning. The present structure does 
not provide an explanation for the adverse effects of 
Argl,164-*Gln, nor of Metl,153->ne and Lys 1,068 -►Glu^ 
Insights into the two activation loop mutations are expected 
from a structure of the phosphorylated form of IRK. 

Other conserved residues 

A number of highly conserved residues in the PTK family (Figs 
3, 5b) do not play a direct role in catalysis, but are important 
instead in structure stabilization. Many of these residues are also 
conserved in the PSK family. As in cAPK, Glu 1,179 (Glu 208) I 
is salt-bridged to Arg 1,253 (Arg 280), and Asp 1,191 (Asp 220) I 
is hydrogen-bonded to the backbone amide groups of His 1 ,130 ■ 
(Tyr 164) and Arg 1,131 (Arg 165) of the catalytic loop. The I 
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Glu 1,047 (Glu 91) side chain is not salt-bridged to Lys 1,030 
(Lys72), but rather is disordered. Glycines 1,082, 1,152 and 
1^225 have backbone torsion angles less favourable for C^-con- 
taining residues. Gly 1,196 is in an a-helical conformation, but 
a CP atom at this position would result in steric clashes with 
the side chains of Cys 1,245 and Trp 1,246. The imidazole group 
of His 1,130 is involved in two electrostatic interactions: N51 is 
hydrogen-bonded to the backbone amide group of Asp 1,132 
(Asp 166), and Nelis hydrogen-bonded to a well ordered water 
molecule which in turn is hydrogen-bonded to Asn 1,1370 
(Asn 171) and Asp 1,150052 (Asp 184). 

Conserved, PTK-specific residues that play a role in structure 
stabilization (and not mentioned previously) include Gly 1,119, 
Met 1,120, Ser 1.190, Cys 1,245 and Phe 1,256. A Cp atom at 
Gly 1,1 19 in a E would result in a steric clash with the carbonyl 
oxygen of His 1,058. Met 1,120 and Phe 1,256 form part of the 
hydrophobic environment in which Leu 1,133 of the catalytic 
loop is situated. The hydroxyl group of Ser 1,190 is hydrogen- 
bonded to both the amide nitrogen arid carbonyl oxygen of 
Thr 1,1 87, as well as to a water molecule which is also hydrogen- 
bonded to Asp 1,191 Od2. The environment of the Cys 1 .245 side 
chain in aU is hydrophobic, yet it is hydrogen-bonded to the 
carbonyl oxygen of Leu 1,241, also in aU (Sy-O: 3.4 A). The 
same hydrogen-bonding interaction for a buried cysteine in an 
a-helix was observed in myohaemerythrin^. 

C/s-lnhlbition and trans-actlvatlon 

The apo IRK structure reveals a novel mechanism of autoinhibi- 
tion in which the hydroxyl group of Tyr 1,162 is bound in the 
active site. Several PSKs, including protein kinase C, calmodu- 
lin-d^endent protein kinase II and myosin light-chain kinase, 
contain within the same polypeptide chain a pseudosubstrate 
sequence that blocks access to the active site^^. Pseudosubstrate 
sequences resemble exogenous substrate sequences, with the P- 
site serine or threonine most often replaced by alanine. In con- 
trast, IRK possesses not a pseudosubstrate but a bona fide sub- 
strate known to be autophosphorylated in response to insulin. 

Although evidence for c«-autophosphorylation has been 
reported'*'**, our autophosphorylation experiments on IRK 
(L.W., S.R.H., W.A.H. and L.E., manuscript submitted^ and 
those of others on the insulin receptor cytoplasmic domain and 
intact receptor'^ are consistent with a /rons-autophosphorylation 
reaction, including the first phosphorylation event. Moreover, 
studies on the receptors for EGF, PDGF and fibroblast growth 
factor (FGF) indicate, that autophosphorylation occurs via a 
trans mechanism"*. Based on these results and the observation 
that the ATP-binding site is blocked in the unphosphorylated 
IRK structure, we conclude that the binding of ATP and of 
*self* Tyr 1,162 in the active site are mutually exclusive, such 
that cw-autophosphorylation of Tyr 1,162 does not occur to any 
appreciable extent. Further evidence for this binding exclusivity 
comes from limited tryptic digestion experiments on IRK which 
demonstrate that the activation loop is more readily proteolysed 
in the presence of a non-hydrolysable ATP analogue (data not 
shown). 

A model for insulin receptor activation emerges from the apo 
'RK and ternary cAPK structures and the biochemical data. 
^ solution an equilibrium exists between two activation loop 
conformations, one in which Tyt 1,162 is engaged in the active 
site and both substrate and ATP-binding sites are inaccessible, 
?s seen in the present structure, and the other in which Tyr 1 , 1 62 

disengaged and both binding sites are accessible. The former 
?jfonnation is highly favoured in the unphosphorylated state. 
When Tyr 1,162 is disengaged, Mg-ATP can bind if it is present, 
kinase is then transiently active until Tyr 1,162 retiims to 
active site. In the absence of insulin, the disposition of the 
'^tor^s cytoplasmic domains is such that /ron^-autophos- 
phorylation is prevented. When insulin binds to the a-chains, a 
*^ge within the quaternary structure of the receptor places 
phosphorylation sites of one j3-chain within reach of the 
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active site of the other ^-chain, the juxtamembrane tether pro- 
viding the required flexibility. Intramolecular, trans-auiophos- 
phorylation can then occur when Tyr 1,162 is disengaged and 
Mg-ATP is bound. If phosphorylation occurs on an activation 
loop tyrosine, the loop equilibrium is shifted towards a non- 
inhibiting conformation— Tyr 1,162 disengaged from the active- 
site — ^which is stabilized by specific electrostatic interactions 
involving the phosphotyrosine. The result is a significant increase 
in kinase activity (up to 2p0-fold for the tri-phosphorylated state 
in vitro; L.W., S.R.H., W.A.H. and L.E., manuscript submitted). 

The activation loop of virtually all PTKs contains from one 
to three tyrosines, one of which can be readily aligned in 
sequence with Tyr 1,162 (Fig. 3). Also, Arg 1,131 is invariant in 
the PTK family and the equivalent arginine in cAPK, Arg 165, 
is salt-bridged to phosphorylated Thr 197 (P-Thr 197) in the 
cAPK activation loop. We propose that phosphorylation of 
Tyr 1,162 is the key step in insulin receptor kinase activation, 
whereby P-Tyr 1,162 will be salt-bridged to Arg 1,131, stabilizing 
the non-inhibiting conformation of the activation loop. P- 
Tyr 1,162 may also interact with Arg 1,155, as this residue is 
either an arginine or lysine in nearly all PTKs and the corre- 
sponding lysine in cAPK (Lys 189) interacts with P-Thr 197. The 
functions of P-Tyr 1,158 and P-Tyr 1,163 in activation are less 
clear. A crystal structure of the phosphorylated form of IRK 
should provide insights into the role of phosphorylation in kin- 
ase activation. 

We believe that the main aspect of this m-inhibition//ra/i5- 
activation mechanism will apply to many PTK family members. 
In the unphosphorylated state, the activation loop tyrosine cor- 
responding to Tyr 1,162 will be bound in the active site and, 
concomitantly, ATP binding will be blocked. Autophosphoryla- 
tion of this tyrosine in trans will stabilize the non-inhibiting 
conformation of this activation loop through electrostatic inter- 
actions between the phosphotyrosine and positively charged 
residues. Activation of ^e insulin receptor and its sub- 
family members will differ from that of other receptor 
PTKs in the mechanism of signal transduction— liga'nd-induoed 
intramolecular reaaangement rather than ligand-induced 
oligomerization. □ 



Received 12 October, accepted 17 November 1994. 

1. White. M. F. & Kahn, C- R. J. bid. Cftem. 26», 1-4 (1994). 
Z Ullrich, A,etai. Nature SAS, 756^761 (1985>. 

3. Eblna. Y. et a/. Celt 40, 747-758 (1985). 

4. Ullrich, A. & Schlessinger, J. Cell 203-212 (1990). 

5. Tornqvlsl. H. E. et sL I bloL Chem. 2M, 350-359 (1988). . 

8. Tavare. J. M.. O'Brien, R. M.. Siddle, K. & Denton. R. M. Blochem. J. 2BS, 783-788 (1988). 

7. White. M, F., Shoelson. S. E., Keutmann, H. & Kahn, C. R. J. Wo/. Chem. 263, 2969-2980 
(1988). 

8. Kohanskl, R. A. BiochemlsUy 32, 5773-5780 (1993). 

9. Vlllalba, M. et ai Proe. netn. Acad. ScL USA. 66, 7848-7852 (1989). 

10. Shoelson, S. E., Boni-Schnetzler, M., Pilch, P. F, & Kahn, C R, BiQChemlstry 30, 7740- 
7746 (1991). 

11. Cobb, M, H., Sar\g, B.-C, Gomatez, R, Goldsmith, E. & Ellis, L J. Wo/. Chem. 264, 18701- 
18706 (1989). 

12. FrettBlI. A. L, Trcadway, J, L & Pessin, J. E. / Wo/. Chem. 267, 19521-19S2S (1992). 

13. Koch, C. A.. Anderson, D., Moran, M. F., Blis, C & Pewson, T. Scfence 262, 668-674 
(1991). . 

14. White. M. F. Curr. Opin, ^n. Dev. 4, 47-54 (1994). 

15. Eblna, Y. et ef. Pmc. nam. Acad. Scl. U.SA 64, 704-708 (1987). 

16. Choo, C. K. et a/. J. blot. Chem. 262, 1842-1847 (1987). 

17. Accili, D. et al. J. Endocr. Invest. 16, 857-864 (1992). 
la Hendricteon. W. A. Sc/ence 264, 51-58 (1991)l 

19. Knighton, D. R, et al. Science 263, 407-414 (1991). 

20. DeBondt, H. L et at. Nature 363, 595-602 (1993). 

21. Zhang, F, Strand. A., Robblns, D., Cobb, M. H. & Goldsmith, E. J. Nature 367, 704-711 
(1994). 

22. Hu. S.-H. et aJ. Nature 360, 581-584 (1994). 

23. Vernon, M.. RadzichAndzelm. E.. Ts(gelny. I., Ten Eyck, L F. & Taylor, S. S. Proc natn. Acad. 
Sd. U.SA. 00, 10618-10822 (1993). 

24. Ren, R.. Mayer, B. J, acchetti, P. & Baltimore, D, Scfence 280, 1157-1161 (1993). 

25. Musacchio. A.. Saraste, M. & Wilmanns. M. Watura struct. Biot. 1, 546-551 (1994). 

26. Yu, H. et a(. Geff 76, 933-045 (1994). 

27. Zheng, J. et al. Protein Scferwe 2, 1559-1573 (1993). 

28. Zheng, J. et at. Biochemistry 32, 2154-2161 (1993). 

29. Bossemeyer. a, Er\gh. R. A., Kinzel. V.. Ponslingl, H. & Huber, R. EMBO J. 12, 849-859 
(1993). 

30. Wei. L. Hubbard, S. R., Smith, R. F. & Ellis, U Curr Op/n. struct. Biol. 4, 450-455 (1994). 

31. Taylor, S. T. & Radzlo-Andzelm, E. Structure 2, 345-355 (1994). 

32. Burley. S. K, & Petsko, Q. A. FEES Lett. 203, 139-143 (1986). 

33. Hanks, S. K., (Julnn, A. M, & Hunter, T. Sc;er»ce 241, 42-52 (1988). 

753 



LETTERS TO NATURE 



34. Sheriff. S., Hendrickson. W. A. & Smith, J. U / motec. BW. 197, 273-296 (1987). 

35. Kemp, a E & Pearson. R. S. Biochim, bfophys. Acta 1094, 67-76 (1991}l 

Sa OtwirvowskI, Z. in Data ColiecUon and Processing <ed3 Savi^. L. Isaacs. N. & Bailey, S.) 

56-62 (SERC Oarestxjry Laboratory, Warrington. UK* 1993). . 
37. CCP4: A Sufte of Programs for Protein Ciystattograptiy tSERC Collaboralive Computing 

Project no. 4, Oaresbury Laboratory. Warrington, UK, 1979). 
3a Jones, T. A. Mem. Enzym. U5, 157-171 (1985). 

39. Brunger, A. T. X-PLOR, Version 3. J, A System for X-ray Crystaltography and NMR (Yale Unfv. 

Press, New Haven. USA, 1992>. 
4a Laskowskl. R. A.. MacArthur. M. W., Moss, D. S. & Thornton, h M. J. appl. Crystaltogr. 29, 

283-291 (1993). 

41. Evans, a V. J. /nofec. Graph, IX, 134-138 (1993). 

42. Zheng, J. et af. Acta crystaftogr. D49, 362-365 (1993). 

43. Carson. M. J. mo/ec. Graph. S, 103-106 (1987X 



44. Kraolis, P. J. J. appJ. C/ysAaHogr, 24, 946-^0 (1991). 

45. Nicholis, A., Sharp, K. A. & Honfg. a Protefns Struct Funct Genet U, 281-296 (1991A 

46. Kabsch. W. & Sander, C BkipolyTners 22, 2577-2637 (1983). ^ 

47. Hanks. a.K. Curr. Qp/n. strict Bid. 1, 369-383 (1991), 

ACKNOWL£DGEMENT& We thank H. Yamaguchi. M. Cuff. L Shapiro and a Ogata tor help h 
synchrotron data collection; N. McDonald and K Yamaguch) for discussion: i. Schlessingerfbr 
critical reading of the ntanuscript; N. Belgado for tissue culture; A. Nicholis for help in the use 
of GRASP; and L Ten EycK for making trie coordinates of msmmallan, binary cAPK available. 
This work was supported In part by grants from the NIH (WAH. and LE.), the NSF (WAR) and 
the W. M. Keck Foundation (UE.). LW. is supported by a Juvenile Diabetes Foundation inter- 
nattonal postdoctoral fellowshtp. Beamline X-4A at the National Synchrotron Ught Source, a 
DOE facility. Is supported by the Howard Hughes Medical Institute. Coordinates have been 
deposited In the Brookhaven Protein Data Bank. 



LETTERS TO NATURE 



Evidence for magnetic-field- 
induced anisotropy of the 
Interstellar medium 

K. M. Desart, C. R. Gwinn* & P. J. Diamondt 

* Physics Depaitment, University of California, Santa Barbara, 
California 93106, USA 

t National Radio Astronomy Observatory, PO Box 0, Socorro, 
New Mexico 87801, USA . - ' 

Turbulence in the interstellar medium transfers energy from 
parsec-^ized regions to much smaller scales, and may be respon- 
sible for supporting cloiids against gravitational collapse\ Fluctua- 
tions In the electron density, which trace turbulence, occur on scales 
ranging from 10^ to >10 cm — the largest range of spatial scales 
seen in -natural turbulence. Despite almost thirty years of study, 
however, the causes and effects of Interstellar turbulence are still 
poorly understood. Here we present observations of OH masers 
in the Galactic star-forming complex W49N, which we use as point 
sources to investigate scatter uig along the line of sight. The masers' 
images are elliptical, and aligned roughly perpendicul^ to the 
Galactic plane. This alignment suggests that the magnetic field of 
our Galaxy influences interstellar turbulence^ by mediating the 
transfer of energy from large to small spatial scales. 

We chose to observe the OH masers in W49N because this 
1 1-kpc line of sight samples interstellar turbulence over a long 
path through the plane of the Galaxy, and so yields its average 
properties. Moreover, the expected angular broadening due to 
scattering of '^45 milliarcseconds (mas), as extrapolated from 
measurements of H2O masers in W49N (ref. 4) and from OH 
masers on other lines of sight towards other directions^**, is fairly 
well matched to the excellent imaging capabilities of the recently 
completed Very-Long- Baseline Array (VLBA). Because masers 
are extremely bright, pointlike objects, their scatter-broadened 
images, known as scattering disks, are unaffected by their 
intrinsic structure, or sensitivity limitations'*. Often many masers 
lie close together in star-forming regions, so that many close but 
distinct lines of sight can be studied simultaneously. 

We observed the emission from the W49N OH masers at 
1,667 MHz on 13 March 1993, using very-long-baseline inter- 
ferometry at eight VLBA antennas and one antenna of the Very 
Large Array (VLA). We processed the data with the Mark II 
processor^ of the National Radio Astronomy Observatory to 
obtain cross-power spectra for each antenna pair, mth spectral 
resolution corresponding to 0.156 km s'' .in Doppler velocity. 
We analysed these data with the AIPS software package. We 
used observations of unresolved continuum sources and bright 
maser features to remove instrumental and atmospheric gain 
and phase variations at each antenna. We analysed, and discuss 
in this Letter, only frequency channels in which confusing emis- 
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sion from OH masers in the neighbouring source W49S was 
absent, and in which misers were detected at five or more times 
the noise level. We fit elliptical models for scattering disks 
directly to the measured cross-power spec^a, and determined 
the number and location of maser spots empirically. 

The scattering disks of the masers are elongated, and are 
approximately aligned perpendicular to the Galactic plane. Fig- 
ure 1 shows the observed distribution of maser spots on- the 
sky. The contours for each maser spot at half of the maxim\im 
brightness are shown, magnified eight times for clarity. The mean 
position angle of the minor axes is parallel to the Galactic plane 
to within 5*". The alignment of the spots is highly significant, 
with some evidence for systematic variation across the source. 
Figure 2 shows the fitted major- and minor-axis sizes, with error 
bars from post-fit residuals. The axial ratios range from 1.5:1 




Minor axis (mas) 

FIG. 1 The observed distribution of maser spots on the sky, showing 
-'2:1. elongation, and alignment of minor axes with the Galactic plane, 
for all maser spots detected at >5 times the noise level. The average 
Galactic magnetic field ruc\s along lines of constant Galactic latituo^ 
from lower right to upper left Contours are 50% of peak hrightn^ 
(magnified eight times for clarity). We note a systematic variation oiw 
position angles of the major axes with increasing Galactic latitude, iiw 
crossed ellipse at the lower right of the figure Indicates the (also e^w^ 
times magnified) effecth^e resolution element oriented 40** away from 
the Galactic plane with an axial ratio of 1.4:1, obtained from our many 
interferometer baselines. (Dec, declination, RA, right ascension; b. Gal- 
actic latitude; /, Galactic longitude). 
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< A view of Vienus based on Magellan radar 
maps, which covered 98% of the planet's sur- 
face: 'gaps' were filled with data from previous 
US and Soviet missions and the Arecibo radio 
telescope. Geological features and crater densi- 
ty patterns revealed by such images provide 
strong evidence for recent volcanic and tectonic 
acttvi^. suggesting that Venus is probably still 
active. Pages 756 and 729. (NASA/JPL). 
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t plete author and subject Index 
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Going down. 

The relative densities of compo- 
nents of the Earth's crust and 
mantle are important factors in 
attempts to predict the dynam- 
ics of the Earth's Interior. Syn- 
thetic MORB (mid-ocean-ridge 
basalt) glass, subjected to pres- 
sures characteristic of the lower 
mantle, is shown on page 767 
to transform to a perovskite- 
bearir^ assemblage that should 
be more dense than the sur- 
rounding mantle. This confirms 
the expectation that if subduct- 
ed slabs penetrate into the 
lower mantle, they will be able 
to sink without hindrance. 

! Insulin receptor 

The crystal structure of the pro- 
tein tyrosine kinase domain of 
the human Insulin receptor has 
been determined to 2.1 A reso- 
lution. The overall structure Is 
similar to cAMP-dependent pro- 
ttein kinase, CDK2 and MAP 
I kinase. The structural factors 
I determining tyrosine specificity 
t and its novel autoinhibitory 
mechanism throw light on the 
way In which the Insulin receptor 
and other protein tyrosine kinas- 
es signal. Pages 746 and 726. 

Sun hazard 

A study of the induction of skin 
cancers by sunlight shows that 
^ it both initiates and promotes 
tumours. Initiation occurs when 
UV light induces mutations in 
the p53 gene, present In most 
human skin cancers. But p53 is 
also responsible for the apopto- 
sis of skin ceils with damajged 
DNA, better known as sunburn, 
and sunlight can therefore allow 
the selective expansion of cells 
with mutated p53. Thus a steep 
increase in incidence of squa- 
!^us cell carcinoma of the skin 
js likely in the next few years In 
individuals now aged 30-50 
*^ had excessive exposure to 
sunlight when young. Pages 773 
and 730. 



Legal matters 

The injunction granted In a UK 
court last month allowing Chiron 
exclusive rights to market a 
hepatitis C virus test is good 
news for pharmaceutical compa- 
nies. In the third of the series 
'Science and the Law', Moss 
and Cohen argue that the news 
should be more generally wel- 
comed. Page 814. 

Pain centre 

The existence of a. specific 
nucleus in the human brain 
responsible for the perception 
of pain and temperature sensa- 
tion was postulated in 1911 on 
the basis of the characteristics 
of patients suffering from thala- 
mic pain syndrome. Now Craig 
et al. have used . anterograde 
tracing and immunohistocheml- 
cal staining to Identify this 
nucleus in the posterior thala- 
mus of man and the macaque 
monkey, supporting the idea of 
a central representation of pain. 
Page 770. 

Gadolinium about 

A search for the general prin- 
ciples that govern the formation 
of carbon nanotubes filled with 
metallic compounds by arc 
discharge reveals that the most 
impressive continuous *nano- 
wires' are obtained with metals 
In which an incomplete elec- 
tronic shell is present in the 
most stable oxidation state, 
with chromium and gadolinium 




proving the best for the pur- 
pose. Above, part of a long 
(1 pm) nanotube containing 
crystalline chromium carbide, 
Distance between parallel 
fringes of graphitic tubes is 
0.34 nm. Pages 761 and 731. 
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Abstract 



Protein kinases with a conserved catalytic domain make up one of the largest *superfamilies' of 
eukaryotic proteins and play many key roles in biology and disease. Efforts to identify and classify all 
the members of the eukaryotic protein kinase superfemily have recently culminated in the mining of 
essentially complete human genome data. 



Phosphorylation by protein kinases is recognized as a major 
mechanism by which virtually every activity of eukaryotic 
cells is regulated, including proliferation, gene expression, 
metabolism, motility, membrane transport, and apoptosis. 
An ultimate goal of research into signal transduction is to 
reach a full understanding of the protein phosphorylation 
events that occur within individual cell types and how they 
eventually impact on cell behavior. A milestone en route to 
this ambitious goal is a determination of the number of 
protein kinases encoded by eukaryotic genomes and an 
assessment of their structures, functions, and evolutionary 
relationships. This article traces the progress made toward 
achieving these objectives in the pregenomic and genomic 
eras, which culminated recently with reports on the *full 
complement' of human protein kinases. 

The pregenomic era 

About sixteen years ago, while working at the Salk Institute, 
my colleagues and I undertook a comparative analysis of all 
the available sequences of protein kinase catalytic domains 
[1]. This interest stemmed from my having identified several 
novel human protein kinases using a homology-based cDNA 
cloning strategy [2] and wanting to determine their relation- 
ships to other known protein kinases. In collaboration with 
the Salk*s resident protein kinase guru Tony Hunter and bio- 
computing specialist Anne Marie Quinn, we aligned the 
homologous catalytic-domain amino-acid sequences of 65 



distinct protein kinases from diverse eukaryotes (including 
45 nonorthologous vertebrate enzymes) and constructed a 
phylogenetic tree to visualize their overall relationships [1]. 
The alignment (produced manually at the word-processor) 
defined the boundaries of the eukaryotic protein kinase 
(ePK) catalytic domain, revealed conserved subdomains that 
were never interrupted by amino-acid insertions, and identi- 
fied highly conserved individual amino acids and motifs 
(Figure 1). 

The phylogenetic tree revealed major clusters including the 
tyrosine kinases (the TK group), cyclic nucleotide- and 
calcium-phospholipid-dependent kinases (the AGC group; 
including the PKA, PKG, and PKC families) and cabnoduHn- 
dependent kinases (the CAMK group). These groupings indi- 
cated that ePK domain phylogeny reflects substrate 
specificity and/or mode of regulation and could therefore 
serve as a useful classification tool. Over the next 7 years I 
continued to add new sequences to the ahgnment as they 
became available and to construct phylogenetic trees as a 
means of classifying the burgeoning ePK superfamily. By 
early 1994, the ePK domain alignment had grown to contain 
390 sequences including 205 non-orthologous vertebrate 
ePKs, and a fourth major ePK group (CMGC, comprising the 
CDK, MAPK, GSK, and CLK families) had been added 
through phylogenetic analysis [3]. The 390 ePK domain 
alignment was made publicly available through the Protein 
Kinase Resource website [4]. 
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Amino-terminal lobe Carboxy-terminal lobe 

(ATP binding) (peptide binding and phosphotransfer) 



Figure I 

The ePK catalytic domain. The 1 2 conserved subdomalns are indicated by Ron^n numerals. The positions of amino-acid residues and nx)tifs highly 
conserved throughout the ePK superfemily are indicated above the subdomalns, using the single-letter amino-acid code with x as any amino acid. Crystal 
structures show that ePK domains adopt a common fold consisting of amino-terminal and carboxy-terminal lobes connected by a hinge region. Binding of 
Mg-ATP is largely the function of the amino-terminal lobe and hinge region, while peptide-substrate binding is mediated by the carboxy-terminal lobe. 
Particularly important for catalytic function are the invariant lysine in subdomain II and the invariant aspartate in subdomain VII that function to anchor 
and orient ATP, and the invariant aspartate in subdomain VI B which is the likely catalytic base in the phosphotransfer reaction. More detailed discussions 
of ePK subdomalns and conserved residues in relation to crystal structures and catalytic function can be found in [3,4,12,13]. 



The genomic era 

By 1995> with the advent of genome-sequencing projects, the 
task of cataloging and classifying the members of the ePK 
superfamily had grown to become too distracting from my 
funded research and I discontinued my efforts in this area. 
Tony Hunter continued to work with bioinformaticians at 
SUGEN, Inc. (including Greg Plowman, Gerard Manning, 
and Sucha Sudarsanam) to characterize the full ePK comple- 
ments of model eukaryotes from genomic sequence data 
[5»6]. By the time of a recent report [7], their efforts had 
resulted in the identification and classification of 115 distinct 
ePKs from budding yeast (around 2% of all genes), 434 from 
Caenorhabditis elegans (about 2.5% of all genes), and 223 
from Drosophila. In addition they described the comple- 
ment of 'atypical protein kinases' (aPKs) from these species: 
15 from yeast, 20 from C. eleganSy and 16 from Drosophila. 
(The aPKs are a variety of protein kinases that lack strong 
sequence similarity to the classical ePK domain but have 
been shown experimentally to have protein kinase activity; 
well-known examples are the *hpid kinases' of the phos- 
phatidylinositol 3 '-kinase (PI3K) family, some of which have 
been shown experimentally to have protein kinase activity.) 

As a result of their comprehensive analyses of 'kinomes', the 
SUGEN investigators were able to define three new major 
groups within the broad ePK classification scheme: first, the 
STE group, which includes ePKs that function in the MAPK 
kinase cascades that were first described through characteri- 
zation of yeast sterile mutants; second, the CKi group, 
including the casein kinase 1 family and related enzymes, 
which is greatly expanded in the worm; and third, the TKL 
Ctyrosine-ldnase like') group that includes the STKR family 
of TGFbeta serine/threonine kinase receptors and is phylo- 
genetically close to the tyrosine kinases (TKs). Many distinct 
kinase families within the AGC, CAMK, CM(}C, STE, and 
CKi groups have representatives from all three species, sup- 
porting the idea of an early evolutionary origin and critical 



function in basic cellular processes. Members of the TK and 
TKL groups are notably absent from yeast, consistent vwth 
the known functions of these ePKs in intercellular signaling 
events associated with metazoan complexity. More discussion 
of the evolutionary relationships among the ePKs identified 
through the SUGEN genome-mining efforts has been pub- 
hshed elsewhere [7]. The SUGEN kinase.com website [8] 
includes links to all their published work on protein kinase 
analysis as well as *KinBase', a very useful searchable database 
that holds information on all the protein kinase genes found 
in the yeast, worm, fly, and human (see below) genomes. 

Human protein kinases 

The completion of the first draft of the human genome 
sequence presented an opportunity to determine the fiill 
complement of human protein kinases. The first anjdysis 
came from a group led by Mitch Kostich at Schering-Plough 
Research Institute (SPRI) [9]. This group mined public 
GenBank records (available before December, 2001) for ePK 
sequences by performing BLAST sejurches using known ePK 
domains as queries. The resulting hits were consolidated, and 
efforts were made to remove non-human sequences, pseudo- 
genes, and poor-quality sequences that could represent dupli- 
cate hits. The SPRI investigators chose to err on the side of 
inclusion rather than exclusion, however, and many cases of 
*single hit* sequences were retained. Their effort culminated 
in a collection of 510 potentially unique human ePKs. A color- 
coded ahgnment that accompanied their article [9] nicely 
illustrates the ePK domain sequence conservation. 

The SUGEN group, led by C5erard Manning and Sucha 
Sudarsanam, carried out a more comprehensive effort to 
describe and classify aU human ePKs [10]. They employed a 
dataset that included, in addition to the pubhc databases, 
genomic reads from Celera that are not publicly available, 
non-pubhc expressed sequence tags (ESTs) from Incyte and 
SUGEN, and they searched using a hidden Markov model of 
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the ePK domain that allowed detection of very divergent 
family members. The sequence data were further searched 
for members of the various known aPK famihes. Using strin- 
gent criteria to ehminate false positives (including verifica- 
tion of novel sequences by cDNA cloning) they compiled a 
list of 478 human members of the ePK superfamily and 
another 40 aPKs, bringing their human kinome total to 518 
(approximately 1.7% of all predicted human genes). They 
also identified 106 ePK or aPK pseudogenes. 

A comparison of the SPRI-510 and SUGEN-518 hsts reveals 
474 protein kinases in common (see the additional file avail- 
able with this article online). Of the 44 SUGEN-specific 
kinases, 32 are aPKs; the other 8 aPKs identified by SUGEN, 
from the ABCi and RIO famihes, were included in the SPRI 
list as a result of their having weak ePK domain similarity. 
Of the remaining 12 SUGEN-specific ePKs, five (TAKi, 
MLKL, NEK5, SgK307, and TBCK) were not available in the 
pubhc data used in the SPRI analysis; another five (SgKi96, 
SgK223, SgK424, SgK493, and Slob) have rather divergent 
ePK domains that lack many of the highly conserved residues 
and are unlikely to have catalytic activity, so it is easy to see 
how these might have been excluded by visual inspection; 
and the final two are SgKiio and NEKio. SgKiio was actually 
detected by the SPRI search, but it was erroneously merged 
with a related sequence ACoo8735_EPKi (SgKo69) on the 
same genomic contig; and it is unclear why the SPRI group 
missed NEKio. Most, if not all, of the 36 SPRI-specific ePKs 
represent over-inclusion errors (Table 1): 14 correspond to 
sequences determined to be pseudogenes by the SUGEN 
group; 19 are based on single sequences that are (or appear to 
be) either poor-quality duplicates of other ePKs or inter- 
species contaminants; and the remaining three are dupHcates 
arising by virtue of non-overlapping partial sequences. 

Thus the SUGEN compilation of 478 human ePK superfamily 
genes represents the accurate count based on current 
sequence data. If one subtracts those that lack key conserved 
residues, we are left with 428 human ePKs with known or 
likely kinase function (Table 2), 99% of which were included 
in the SPRI hst; 365 of these fall within the seven major ePK 
groups: TK, 84 in total; CAMK, 66; AGC, 61; CMGC, 61; STE, 
45; TKL, 37; and CKi, 11. The remaining 63 are in the 'Other' 
category, falling outside the main ePK group branches. Krupa 
and Srinivasan [11] have also recently searched the public 
human genome data vdth a focus on identifying functional 
protein kinases; their efforts resulted in a hst of 448 distinct 
human ePK sequences, but around 90 of these appear to rep- 
resent duplicate entries, and no novel protein kinases were 
identified that were not present in the SUGEN compilation. 

Usefulness of the kinome data 

Knowing the fiill complement of ePK family members and 
functional ePKs encoded by eukaryotic genomes will have 
great impact upon many areas of scientific investigation. As 
mentioned above, an obvious benefit relates to understanding 



Table I 



Putative ePKs identified by SPRI but not SUGEN 



Category 


SPRI name 


Comment 


14 pseudogenes 


LOC95530 


Corresponds to bUtitN riArzikzps 




ACOOoO 1 4_tPK 1 


Corresponds to iUotiN riAr\i\.psiz 




AC023095_EPK 1 


Corresponds to SUvjciN rLK 1 ps I 




AC024933_EPKI 


Corresponds to bUOtiN riAt\i\.psju 




AC09I554_EPKI 


Corresponds to SUGEN SRPK2ps 




AI62I045 


Corresponds to SUGEN SgK384ps 




AI99II74 


Corresponds to SUGEN TSSKps2 




ALI38964_EPKI 


Corresponds to SUGEN MARKps23 




ALI6I450_EPKI 


Corresponds to SUGEN CKIg2ps 




AL39006I_EPKI 


Corresponds to SUGEN MNKIps 




BG742738 


Corresponds to SUGEN TLK2ps2 




LOC65348 


Corresponds to SUGEN CDK7ps 




LOC65743 


Corresponds to SUGEN PAK2ps 




LOC85643 


Corresponds to SUGEN CLK2ps 


1 1 duplicates 


Al 34291 1 


Corresponds to iUCibN clnj 


from poor- 
quatity 


AA56576 1 


Corresponds to SUGEN CLK3 


single-EST 


AI826899 


Corresponds to SUGEN CDK4 


sequences 


AW 148845 


Corresponds to SUGEN ILK 




AW674733 


Corresponds to SUGEN ILK 




AA730234 


Corresponds to SUGEN MAP2K7 




AI075923 


Corresponds to SUGEN TSSK4 




AI206530 


Corresponds to SUGEN TLKI 




AW5 18044 


Corresponds to SUGEN MPSKI 




AL5380I4 


Corresponds to SUGEN BRSK2 




BG986727 


Corresponds to SUGEN Obscn 


Four non- 


AC025598 


Mus musculus MAP2K6 


human 
sequences 


AC026940_EPKI 


Mus musculus CDK7 




PKCfTB 


Rottus norvegfcus PKCeta 




CAMKII 


Sus scrofa sequence, subsequently 
recalled 


Three duplicates 


AC023837_EPKI 


Part nf EnhA6 


AA263006 

overlapping 

partial sequences BE5678 1 6 


PartofHIPKI (along with BEI80036) 
Part of YANK I (along with BG036777) 


Four others 


AW499744 


Single EST; does not seem to encode 
an ePK 




AJ336398_EPKI 


Single genomic sequence, possibly 
prokaryotic 




AL5I2402_EPKI 


Single genomic sequence, possibly 
nonhuman 




ILK2 


Possibly murine ILK; published as 
human [ 1 6] 
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Table 2 



The 428 human ePKs with known or likely kinase catalytic function 



Group Number 
within group 



Family 


Number 


Family members 




within ^mily 


(SUGEN nomenclature) 


Ack 


2 


ACK, TNKI 


Abl 


2 


ABU ARG 


Csk 


2 


CSK. CTK 


FAK 


2 


FAK. PYK2 


Fer 


2 


FER, FES 


JAK 


4 


JAKI,JAK2.JAK3. TYK2 


Src 


II 


BLK, BRK, FGR, FRK. FYN. HCK, LCK, LYN, SRC, SRM. YES 


Syk 


2 


SYK. ZAP70 


Tec 


5 


BMX. BTK, ITK. TEC. TXK 


Alk 


2 


ALK, LTK 


Ax! 


3 


AXU MER, TYR03 


DDR 


2 


DDRI,DDR2 


EGFR 


3 


EGFR. HER2/ErbB2, HER4/ErbB4 


Eph 


12 


EphAI, EphA2, EphA3. EphA4, EphAS, EphA6. EphA7, EphA8. EphBI, EphB2, EphB3, EphB4 


FGFR 


4 


pr:PQ 1 Ff^FR"* FfiFR4 
rvjrM, rv3rr\z, rvarrvj, rvjrrvf 


InsR 


3 




Lmr 


3 


1 MR 1 t MR*? 1 MR*) 

Llnrvl. LI ll\X. LI ll\J 


Met 


2 


MET D/^M 


Musk 


1 


Ml ICI^ 


PDGFR/VEGFR 8 


Fl T"? FMC I^IT PnrtFRa Pn^FRh Fl Tl Fl T4 KDR 


Ret 


1 


RFT 


Ror 


2 


RORI ROR*) 


Sev 


1 


ROQ 


Tie 


2 


XI CI TIO 
1 It 1, 1 Itx 


Trk 


3 


TPI^A TRIfR TRt^f" 


PKA 


5 


PKACa, PKACb. PKACg, PRKX. PRKY 


PKG 


2 


PKGI,PKG2 


PKC 


9 


Pl^r"i P\(rh PiCCA PtCCa PKCo PKCh PKCI PKCt. PKCz 

r^V>^, r^^D, r^^Q, riN^c, rIS.v»g, riN.v..ri, rrNVi.1. r ixv^i^ i rv^— i 


AKT 


3 


AKT 1, AKIA AM J 


DMPK 


7 


CRIK. DMPK I, DMPK2. MRCKa. riKv-K.D. Kt-Jt-M. K(J^-nz 


GRK 


7 


BARKI, BARK2, GPRK4, GPRK5, GPRK6, GPRK/, RHUis. 


MAST 


5 


MASTI, MAST2, MAST3, MAST4, MASTL 


NDR 


4 


LATS 1 , LATS2, NDR 1 . NDK2 


PKB 


1 


PDKI 


PKN 


3 


DI/KII D^KI"} 


RSK 


9 


MCW 1 MQl^O RQI^I RCt^"? R^l^*) R^l^4 <sak494 n70^6K D70S6Kb 


SGK 


3 




YANK 


3 


VAK.II/I VAKII/") VAKIk"3 
TAINM, TAINNZ, TAiNNJ 


CAMKI 


5 


CaMK 1 a, CaMK 1 b. CaMK 1 d, CaMK 1 g. CaMK4 


CAMK2 


4 


CaMK2a. CaMK2b. CaMK2d. CaMK2g 


CAMKL 


20 


AiuiDi/. 1 AMDI/.1 DDci/i DDCt^') r~LJi^ 1 UIIMI^ Ik'RI MARI^ 1 MARK? MARK3 
AMPKal, AMPKal, BRSKI, BK^K2. C-HM. nUiMlv, LMJI. riA?\M, i immva i 






MARK4. MEU NIMI. NuaKI, NuaK2, PASK, QIK. QSK. SIK, SNRK 


DAPK 


5 


DAPKI. DAPK2. DAPK3. DRAKI, DRAK2 


DCAMKL 


3 


DCAMKLI, DCAMKL2. DCAMKL3 


MAPKAPK 


5 


MAPKAPK2, MAPKAPK3, MAPKAPKS. MNKI. MNK2 


MLCK 


4 


caMLCK. skMLCK, smMLCK, SgKOSS 


PHK 


2 


PHKgl.PHKg2 


PIM 


3 


PIMI.PIM2. PIM3 


PKD 


3 


PKDI.PKD2, PKD3 


PSK 


1 


PSKHI 


RAD53 


1 


CHK2 


Trio 


4 


Obscn, SPEG, Trad. Trio 



TK 



84 



AGO 



61 



CAMK 



66 
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Table 2 (continued) 




NunrtDer 




NJi imK^r* 




Group 


within group 


Family 


within family 


Family members 






TSSK 


5 


SSTK, TSSK I , TSSK2, TSSK3. TSSK 






CAMK-Unique 


1 


STK33 


CMGC 


61 


CDK 


20 


CCRK. CDC2, CDK2. CDK3, CDK4, CDK5. CDK6, CDK7. CDK8. CDK9, CDK 10. 










CDKI 1, CHED, CRK7. PCTAIREI, PCTAIRE2, PCTAIRE3. PFTAIREI. PFTAIRE2, PITSLRE 






MAPK 


14 


ErkI, Erk2. Erk3, Erk4, Erk5. Erk7. JNKI. JNK2JNK3, NLK, p38a. p38b. p38d. p38g 






GSK 


2 


GSK3A, GSK3B 






CLK 


4 


CLKI,CLK2, CLK3. CLK4 






CDKL 


5 


CDKLI, CDKL2, CDKL3, CDKL4, CDKL5 






U T 


10 


HYRI^IA HYRI^IR HYRI^? HYPt^'^ rtYRI^4 MtPt( 1 I-IIPK7 MIPK4 PRP4 








J 


HAN, nwi^K 








1 
J 




STE 


45 


STE7 


7 


MAP2KI. MAP2K2, MAP2K3. MAP2K4. MAP2K5, MAP2K6, MAP2K7 






STE20 


78 


GCK. HPKI. KHSI. KHS2. LOK, MSTI, MST2, MST3, MST4. MY03A, MY03B, OSRI. 










PAKI. PAK2, PAK3, PAK4, PAK5, PAK6. SLK, STLK3, TAOl, TA02. TA03, YSKK 










ZCI/HGK, ZC2/TNIK, ZC3/MINK, ZC4/NRK 






STEI 1 


8 


MAP3KK MAP3K2, MAP3K3. MAP3K4, MAP3K5. MAP3K6, MAP3K7, MAP3K8 






STE-Unique 


2 


COT, NIK 


TKL 


37 


IRAK 


2 


IRAKI, IRAK4 






LISK 


4 


LIMKI, LIMK2, TESKI, TESK2 






LRRK 


2 


LRRKI, LRRK2 






MLK 


9 


DLK, HH498, LZK, MLKI, MLK2. MLK3, MLK4, TAKI, ZAK 






RAF 


3 


ARAF, BRAF, RAF! 






RIPK 


5 


ANKRD3, RIPKI, RtPK2, RIPK3, SgK288 






STKR 


12 


ACTR2, ACTR2B, ALKI, ALK2. ALK4, ALK7, BMPRIA, BMPRIB. BMPR2, MISR2, 










TGFbRI,TGFbR2 


CKI 


II 


CKI 


7 


CKIa, CKIa2. CKId, CKIe, CKIgl, CKIg2, CKIg3 






TTBK 


2 


TTBKI,TTBK2 






VRK 


2 


VRKI. VRK2 


Other 


63 


Aur 


3 


AurA, AurB, AurC 






bUb 


1 


Dl IQ 1 






D. •A^'i 

budJz 


1 
1 








CAMKK 


z 








cut./ 


1 
1 


v-UC/ 








Z 


cixza 1 , civzax 










ll^l^o ll^i^K Ik'l^o TRI^ 1 
IIS.I\.a, ll\.l\.D, INNe, 1 DM 






IKt 


L 


■ DPI IRPO 








1 
1 










A 

*r 










1 1 
1 1 


MFkl Nirt^7 NFK*) Nf)C4 t^PK^ NFKA NIFK7 NFK8 NEK9 NEKIO NEKII 

INC^I, INCixj, INClN^, INCivJ, INCInD, INCix/, I>IuIVO, l>ltrV7, iNtrxiv, i^trvi i 






INI\r 1 


■a 








iNis.r/ 


1 
1 


PIMI^ 1 






NKF4 


2 


CLIKI.CLIKIL 






PEK 


4 


GCN2, HRI, PEK, PKR 






PLK 


4 


PLKI,PLK2, PLK3,PLK4 






TLK 


2 


TLKI,TLK2 






TOPK 


1 


PBK 






TTK 


1 


TTK 






ULK 


4 


Fused, ULKI.ULK2, ULK3 






VPSI5 


1 


PIK3R4 






WEE 


3 


MYTI, Weel,WeelB 






Wnk 


4 


WnkI, Wnk2, Wnk3. Wnk4 






Other-Unique 


2 


KIS, SgK496 
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VIB 



TK 



AGC 



CAMK 



HrDlAa^n 

SI t A 



Y R D 1 K p el 
H i 1 d 



HrDl kpeH 
1 i a 



CMGC - r D 1 K p ON 



STE 



TKL 



CK1 



B r D i K g xN 
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glutamine in the 'APE' motif) - two regions that have been 
recognized as being primarily involved in peptide-substrate 
recognition [12,13]. A number of group-specific differences 
are apparent (highlighted in Figure 2) that correlate with 
unique peptide-recognition tendencies for the ePKs that fall 
within a given group [14]. Beyond sequence analysis, the 
kinome data will allow for the development of comprehen- 
sive tools (such as full-length cDNAs, microarrays, antibod- 
ies, and fusion protein and RNAi constructs) that will 
greatly aid laboratory investigations aimed at understand- 
ing cell signaling through analysis of kinase function. As an 
example of such proteomic approaches to the study of 
protein kinases, nearly all yeast protein kinases have been 
expressed in bacteria and analyzed for their ability to phos- 
phorylate an array of protein or peptide substrates using 
protein-chip technology [15]. Finally, the human kinome 
data will have benefits in the understanding and treatment 
of human diseases. The ePK genes that map within disease 
loci are attractive etiological candidates, and knowledge of 
the full repertoire of human protein kinases will greatly aid 
in the development of drugs that target specific protein 
kinases or protein kinase families whose function con- 
tributes to disease-associated cellular defects. 



Figure 2 

Conserved residues implicated in peptide-substrate recognition. 
Consensus motifs for the catalytic loop region in subdomain VIB and 
activation loop region in subdomain VIII were determined for the 
members of each of the seven major ePK groups with known or likely 
kinase activity. Invariant residues at a given position are indicated by single 
upper-case letters. Two upper-case letters at a single position indicate 
that either of two residues are strictly conserved, the most frequent 
shown in the top row. Positions in which more than two amino acids are 
present are indicated with lower-case letters; a single letter indicates that 
only one residue is highly conserved, two letters indicate that either of 
two residues are frequently conserved (most frequent on the top row), 
and *x' indicating poor positional conservation. Residues highlighted in 
outline are notably conserved within an ePK group and are thought to 
function in the recognition of peptide substrates speciftcally targeted by 
the members of the group. 



of how signal transduction pathways evolved during the 
course of eukaryotic evolution. Both SUGEN [10] and Krupa 
and Srinivasan [11] extended their analyses to describe 
other domains present in the various human ePKs which 
are Hkely to function in directing the enzymes to relevant 
substrates or modulating kinase activities. Further analysis 
of the ePK domain sequences uniquely conserved within the 
major groups and families, together with comparisons of 
ePK domain crystal structures, should ultimately allow a 
full understanding of how different classes of peptide sub- 
strate are recognized. For example, Figure 2 shows consen- 
sus sequences for the catalytic loop region in subdomain 
VIB (which includes the invariant aspartate thought to 
function as the catalytic base) and the activation loop region 
in subdomain VIII (which includes the highly conserved 
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The protein kinases of budding yeast: 
six score and more 

Tony Hunter and Gregory D. Plowman 

The completion of the budding yeast genome sequencing project has 
made it possible to determine not only the total number of genes, but 
also the exact number of genes of a particular type^~^. As a consequence, 
we now know exactly how many protein kinases are encoded by the yeast 
genome, a number of considerable interest because of the importance of 
protein phosphorylation in the control of so many cellular processes. 



BUDDING YEAST has 1 13 conventional 
protein kinase genes, corresponding to 
-2% of the total genes (see Table I in 
centrefold). More than 60% of these 
protein kinases have either known or 
suspected functions; the remainder are 
novel, and functional analysis awaits. In 
terms of defined hinctions encoded by 
the yeast genome, protein kinases come 
a close second behind transcription 
factors'. 

What can be learnt from knowing 
all the protein kinases encoded by a 
single eukaryotic genome? One obvious 
outcome is that It Is possible to say 
whether a protein kinase identified and 
characterized in another organism has 
a homologue In budding yeast. Such a 
homologue might either have a known 
(unction in yeast, or its function can be 
tested by genetic studies. Equally im- 
portant is the recognition of protein 
kinase subfamilies present in higher 
eukaryotes that are absent from yeast. 
Whereas all eukaryotes have similar 
requirements for DNA replication, tran- 
scription, translation and energy metab- 
olism, it Is reasonable to expect that 
there might be many protein kinases 
unique to multicellular organisms that 
function in cellular communication, 
both between cells, tissues and the en- 
vironment, such as proteiii-tyrosine ki- 
nases. Conversely, there might be pro- 
tein kinases unique to budding yeast. 
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growth, either singly or in combination 
(H. Madhani and G. Fink, pers. commun.). 

The putative protein kinase YKL161C 
clusters with the MAP kinases (Rg. 1) and 
might be another MAP kinase (although 
it has a KxY rather than TxY motif in 
the activation loop), but similarly, this 
gene is not essential for pseudohyphal 
development. As Ste7 normally phos- 
phorylates the Thr and Tyr in the TxY 
motif in the activation loop of Fus3 and 
Kssl, one possibility is that Ste7 might 
phosphorylate and activate another 
protein kinase with a related activation- 
loop motif. Three other protein kinases. 
iCin3/Npkl. Ssn3/Srbl0 and Ime2, have 
a TxY activation-loop sequence and 
might, in principle, be regulated by Ste7 
phosphorylation. Ssn3/Srbl0 Is a cyclin- 
dependent kinase (CDK) and is unlikely 
to be a Ste? target. Ime2, however, has 
motifs characteristic of proline-directed 
protein kinases, like the MAP kinases, 
and is on the same major branch as the 
MAP kinases (Rg. 1), making it a poten- 
tial candidate, although Ime2 is normally 
only produced during meiosis^. 

With regard to the protein kinases 
lying upstream of the MAP kinases, 
there are four MAPKKs (Ste7, Mkkl, 
Mkk2 and Pbs2: STE7/MEK family), and 
four MAPKKKs (Stell, Bckl. Ssk2 and 
Ssk22: STEll/MEKK family). These have 
been assigned to the Fus3+Kssl, Mpkl 
or Hogl MAP kinase pathways by genetl- 
cal and biochemical analyses'*. This 
leaves both the Smkl MAP kinase, 
whicli Is required for spore wall assem- 
bly^ and YKL161C without their own 
specific MAPKK and MAPKKK. Possibly, 
a MAPKKK/MAPKK combination used 
for one of the other MAP kinase path- 
ways is also used for these predicted 
MAP kinases. Interestingly, however, the 
Smkl pathway might have a specific 
MAPKKK kinase, as the Spsl protein 
kinase (NRK/MESS family), which lies 
upstream of Smkl (Ref. 8). is related to 
Ste20, a MAPKKK kinase in the Fus3 
MAP kinase pathway. 

The Ste20 f^ily. Ste20 is a Cdc42- 
activated protein kinase^ that is required 
In the pheromone response MAP kinase 
pathway upstream of Stel 1 (Refs 10, 1 1). 
A ste20 mutant is viable and only shows 
a defect In response to mating phero- 
mone* ^ Cla4 Is a second member of the 
Ste20 family, which also Interacts with 
Cdc42. and has been implicated In polar- 
ized cell growth, budding and cyto- 
kinesis, but a cla4 mutant is viable de- 
spite a cytokinesis defect*^. 

A double ste20/cla4 mutant, however, 
cannot undergo cytokinesis, implying 

Copyright © 1997, Elsevier Science Ud. All rights reserved, 0968-0004/97/$n.OO PH: S096&0004(96) 10068-2 



Multiple alignment and parsimony 
analysis of catalytic domain sequences 
(Fig. 1) categorizes the yeast protein ki- 
nases Into subfamilies based on structural 
relatedness. From such a classification, 
one can Infer functional similarities, In- 
cluding regulation of catalytic activity, 
subntrate specificity and cellular localiz- 
ation. This information Is of particular 
value for understanding the function of 
the numerous uncharacterized yeast 
open reading frames that exhibit protein 
kinase motifs. One can also determine 
which protein kinases subfamilies are 
conserved or expanded In other organ- 
Isms and which are unique to yeast. 

MAP kinase pathways 

Pseudohyphal development One of the 
virtues of knowing all yeast protein ki- 
nases is that it delimits the number of 
protein kinases in a particular subfam- 
ily. For example, five members of the 
mitogen-activated protein (MAP) kinase 
family, Fus3, Kssl. Hogl. Mpkl/Slt2 and 
Smkl, had been Identified and function- 
ally characterized before the completion 
of the genome project^ However, both 
haplold invasive growth and diploid 
pseudohyphal development of budding 
yeast are known to require Stell and 
Ste7. which are a MAP kinase kinase 
kinase (MAPKKK) and a MAP kinase 
kinase (MAPKK), respectively. These ki- 
nases normally function in the mating 
pheromone response pathway to acti- 
vate the Fus3 (or Kssl) MAP Idnase, and 
also Stel2, a transcription factor that is 
phosphorylated and activated by Fus3 
(Ref. 5). This suggested that a MAP ki- 
nase would be required for pseudo- 
hyphal development, but none of the 
five characterized MAP kinases proved 
to be essential for haplold invasive 
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Table I. Classification of Saeeharomyces cerevlslae protein kinases (contd) 



NIMA/NEK tamUy (Simitar to NlMA.en. NlMAl.h. NEKl.h) 

iaN3/liPKVFUN52/YAR01SC Ser/Thr protein kinase; nuU mutation has no phenotype. TTY in kinase subdomain VIII (activation loop) 

<'NEK^ famOy (Siimlar to F35Q12.3jce. weakly to NEKl^h) 
YNU|iaE0C/N9|23 Ser/Thr protein Kinase of unknown function 
YM9BW/(mia} Ser/Thr protein kinase of unknown function 
YBROSSC/YBROilS Ser/Thr protein kinase of unknown function 



Oth>r gou p (24 mombaw) ^ 

CMdn MnaM I iMnUy (Similar to CKI.h) 

YCXVCM2/YHR138C Casein kinase t <CKi) isofonn 
YCIU/(CIUl)/(CtU)/lil758/YNUS4C CKI Isoform 
YCIO/CIU3/YER123W CKi Isoform 

HRR25/PlS50/YPi204W CKU Ser/Thr/Tyr protein kinase: associated with DNA repair and meiosis 

CM«ln kbWM II fMnRy (Similar to CKA2.h) 
CKA1/YIL035C CKII. catalytic (a) subunit 
CKA2/0381OAOII0S1W CKII, catalytic (ot) subunit 

*CDC7/SAS1/0AF2/I>2855/YD1017W Protein kinase required for initiation of DNA synthesis, for commitment to sporulation. for DNA repair 
and (or melotic recombination. (Similar to HSKl^sp) 

NPR/HALS fimlly (Unique to S. cerevislae) 

HAtS/JOSSl/YiLieSC Ser/Thr protein kinase involved in salt and pH tolerance 
YNUi88C/YKU32 Ser/Thr protein kinase of unknown function 
SilT4/YCR10VVCR046ACR008W Protein with similarity to Nprlp protein kin^e 
Yifl088W/J172B Putative Ser/Thr protein kinase of unknown function 
PTK1/YKU98C Ser/Thr protein kinase that enhances spemiine uptake. (Frame shift corrected) 

NPfd/Nlfiao/YNUBSC Ser/Thr protein kinase involved in regulating transport systems for nitrogen nutrients under conditions of nitrogen 

catabolite derepression 
Y01214C/D1014 Ser/Thr protein kinase >*'ith similarity to Nprl 
YttMSC/IKtttO ProteUi with similarity to protein kinase Nprlp 
Y0R267C/09420 Ser/Thr kinase protein kinase with similarity to Nprlp 

ELM fMnOy (Unk)ue to yeast, similar to D45882Lsp) 

MKl/tYQMRMS/YCR129W Protein kinase capable of suppressing DNA polymerase a mutations 
Y01179C/ilK860/0m8 Ser/Thr protein kinase with similarity to Qmlp and Kin82p 
CLM1/YKL261/YKL048C Ser/Thr protein kinase regulating pseudohyphal development 

RAN famRy (Similar to RAN.sp, p78.h) 

SHA3/8KS1/LPQ8/YP1026C Ser/Tbr protein kinase; suppressor of htal mutations that cause aberrant transcription 
Y8fRMnl(/YM^ Ser/Thr protein kinase with similarity to S. pombe RAfr. negative regulator of sexual cof\iugation and meiosis 
*KftPi/VMR082C Ser/Thr kinase with similarity to CKII 

PIHHitolMly (Similar to PIM2.m. KIAA0135.h) 

*YMItt7W/YKUKM(/niN31 Ser/Thr protein kinase of unknown function 
*YOL046W/02D34/YOIJ044W Ser/Thr protein kinase of unknown function 



Unkfitt MiMMs (17 mnnlMit) (No similar S. cerevlsiae kinases) 



Kbrnae wUfi poetiMe homolQiuM In other aptdM 

*COC8/PIUK^MSDa/YNI8270.03/YMR001C Ser/Thr protein kinase required for exit from mitosis; ts mutants block after nudear division. (Similar 
toPLK-l;.h. POLO_dm) 

*IPU/P1820/yPI209C S^/Tltr protein kinase Involved in chronwsome segregation. (Similar to AUR.dm) 

*IRE1/ERN1/YHR079C Protein kinase and type I membrane protein involved in signal transduction from ER lumen to nucleus; part of the unfolded 

pmtelnTesponse. (Similar to C41C4.4_ce) 
*VP818/VPTlBi/(VPtl9)/YBR0828/YBR097W Ser/Thr protein kinase invoWed in vacuolar protein sorting. (Similar to ZK930.1_ce) 
*Ym38C/n087 Protein of unknown function. No GxG (Similar to C3H1.13.sp) (Not in YPO) 
*]gj^jWW/Qlfll8 Ser/Thr protein kinase of unknown function. (Similar to UNC^l.ce. PLOl.sp) 

*SW81/i040C/YitlS7C Ser/Tyr duat-specificify protein kinase able to phosphorylate Cdc28p on tyrosine and inhibit Its activity. (Similar to Weel_sp 
andMULh) 

*SPKi/RAD83/MEC2/SADl/P2S88/YPllS3C Ser/Thr/Tyr protein kinase with a checkpoint function in S and G2. Contains FHA domain. (Similar 
toCDSl.sp) 

*IIP8a/RPia/M786/YDL028C Ser/ThrAyr protein kinase involved In spindle pole body duplication (Similar to ESK.m. TTK.h) 
*YiailBC/YXUU6 Ser/Thr protein kinase with similarly to S. pombe NIMl protein kinase. (33% identity to p78.h) 
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REVIEWS 



Table I* Classification of Saccharomyces cerevlslae protein kinases (contd) 



Kinases with possible homologues In other species contd 

*GCN2/AAS1/D9954.16/Y0R283C Ser/Thr protein kinase that regulates initiation of translation by phosphorylation of e!F2a (Sui2p) (Similar to 
EIF2aK.r, HRI_r) 

*VBia74W/YB!U.742 Protein kinase with similarity to memt)ers of the growth factor and cytokine receptor family. (Simaar to CHKl-sp, CHKl.ce. SHPlj) 
*YGR262C/G9334 Protein with similarity to apple tree Cal\/H)inding protein kinase PIRUQ2251. Lacks GxG - not in alignment (Similar to 
0-sialog^protein endopeptidase from Methanococcus jannaschii) 

Kinases wWiout lutown bomologue 

*BUBl/67542/YGR188C Ser/Thr protein kinase and checkpoir»t protein required for cetkycle arrest in response to loss of microtubule function, 

(Amino terminus similar to MAD3.sc) 
*YKL171WAKLfi35 Ser/Thr protein kinase of unknown function 
*YGR052W/G4329 Protein of unknown function 

*YPR106W/P62tt3.9 Protein with similarity to protein kinases Gcn2p: galactosyltransferase^ssociated protein kinase P58/Gtap. and the Raf 
protOH}ncoprotein 



Atypical protein ki nases (1 member) 

*Y6R080W/G4S83 Protein with similarity to human tyrosine kinase A6 PIR:A55922. 



Miscellaneous kinases 



TORl/DRRl/JlseS/YJROeew Phosphatidylinositol kinase (PI kinase) homologue involved in cell growth and sensitivity to the 

immunosuppressant rapamycin . . . ^ 

T0R2/DRR2/YK1203C PI kinase homologue involved in cell growth and sensitivity to the immunosuppressant rapanr /,.n, suniiar to lorip 
VPS34/vipT29/(VPL7)/END12/L9672JBAWtt40W PI 34(inase required for vacuolar protein sorting; activated by protein kinase VpslBp 
PIKl/Plit41/PiK120/N079S/YNl267W PI 4.kinase; generates Ptdlns(4)P 

Sn4/12142.4ALR30SC PI 44«nase; mmants are staurosporine-sensitive and suppressible by overprodurt^^^ 

IWEC1/ESR1/SAD3ABR1012/YBR136W Checkpoint protein required for mitotic growth; DNA repair and mitotic recombination (R kinase homologue) 
Tai/YB10706ABL088C Protein involved in controlling telomere length; might have PI 3-kinase or protein kinase activity 
YHR099W Protein with weak similarity to Torlp and Tor2p; possible PI kinase homologue 
FABVYFR019W Probable Ptdlns(4)P 5^nase involved in orientation or separation of mitotic chromosomes 
MSS4/YD8142A.05/YD8142.05 Potential Ptdlns(4)P 5-kinase: multicopy suppressor of sft4 mutation 

GuanylBte kinases 

GUK1/D94G139ADR454C Guanylate kinase 

^^^^£JX/S\^C Two<omponent signal transducer v^th both a His kinase domain and a receiver domain that functions in the high osmolarity 
YlS R^lKmK^^^ branched^ain <^etoacid (BKCO) and pyruvate dehydrogenase (PDH) kinases, wWch are protein-serine kinases 

"^SSISaJo?^ Involved in the expression of mitochondrial cytochrome C oxidase subunit 2 (C0)C2). (Ser/Thr protein kinase that 

suppresses the growth defect of snf3 mutants on low glucose.) 
YDR109CAD9727.06 Protein with similarity to FGGY protein kinase family 
YiROeiW Putative Ser/Thr protein kinase of unknown function. (Similar to YKL2(K)c.sc, YKL201c.sc) 
YUt063W/l2174 Ser/Thr protein kinase of unknown function [Unique) 
YNttJ05W/YM99S8.<l3 Protein kinase of unknown function (Similar to ZK370.4.ce. MUO.T.ce) 

YMR192W/YIW9648,04 Ser/Thr protein kinase of unknown function. (Leucine repeat and possible co'Jfd ^il. Similar to YPL249C.sc, U4994g.ce) 
Y010B9W/O3441 Protein with similarity to rat branched<;hain tt^tetoacid dehydrogwase ^l^^\^f-^^J^%^ . ^ ..^j.yes in 

Y6U27W/00958 Protein with similarity to Dictyostetium discofdeum no^r«»plor tyrosine kinase U32174. 37% identity over 64 resiaues in 

amlncKerminal norvcatalytic domain 
Y0R2B7C/0S492 Protein with weak similarity to PITSLRE protein kinase isoforms 



mB budding yeast protein kinases are subdivided into distinct families based on structural similarity In jTi^*:,^^^ 
based on ttTat deviwd by Hanks and Hunter^. Individual kinases are listed by their preferred gene nfflne as f^^^J^:^^l^„ten^ 
Dalhase at Stanford, feilowed by their synonyms and a brief description as malnteined in "«J«^^'" S?Sl|iSy 
http://w«v*.proteome.com. Additional notes, sequence conwtions or close ^re "n^renme^^ " 

by a^ asterisk share only weak similarity to other members. EnWes Bsted '^J^^f^J^^^^^^C^^'^S^oS^ aWlUonal 
tural similarity to the protein kinase family and were excluded from this analysis {some o ^'^^^^^^S^^Sm YMR216C (Group IllD) and 
open reading frames were Identified that encode protein kinases that were not P«"'^'"J^r^. S^^ii^tetevM^ 
YPL236C (Group VP). These new protein kinases were recognized fo«owing a comprehensive ana^ s of the complete yeast oiw sequence ^ 
software (Oxford Molecular) implementation of the Smith Waterman algorithm on a Maspar parallel computer. 
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TMt I CbssfficatlM of Ssectanuqyces cmv^e protein kinases^ 



MCgroypdTmmbm) 



nkmh tmPflmnrtw* (SImflar to PKAJt) 

1Ml/PMViMS/Pia8/J0641AJU^ PKA t catalytic subuntt 

ma/ms/mMM/ymmo pka 3. catalytic ^inm 

PKCVSnVHPO2/(%Yi5/YBL08(>7/YBL108C FKC; regulates MAP kinase cascade involved In regulating ceH wall metabolism 

AftC«wir(SIrnUartoSCKljBp.RAC£U»,AKrj» ^ ^ ^ 

SCm/KONU/VHIOOSW Ser/ntr protein kinase activated tv cAM P; overproduction can suppress ooc25 mutant 
YPNl/YIOiaeW Ser/Thr protein kinase with similarity to PKC 
m2/n(B2/mB7iJki^/mStlSi4C S^/mr protcJn kinase with simiiarity to Vj^klp 

86K (70 kMi) Mty (Similar to KADS.sp, SGKj) 

KIN8^/YCR11SS/YCR091W Ser/Thr protein kinase of unknown function 
YNIMM7W/N3449 Ser/Thr protein kinase of unknown function 

DBr2Mly(SimBarto KAIBjsp. NDRJi) 

DIFVCt4M/YUKW2W Ser/Thr protein kinase simitar to DbQOp: required for anaphase/telophase 
Dm/PttSU/vnuiuw Celkyde pro^ 

PMUeWid fimHy (Unique to S. cerevisiaB) 

Y0tlO0W/raittO8l/O0784 Ser/Thr protein kinase of unknown function 
VDM90e/De03U3 Ser/Thr protein kinase of unknown fu^^ 
WmSiM/timMM Ser/Thr protein Idnase of unknown function 

(Wmt AOC tMRQy (Similar to Cmjip, OOTdUc* NDR.h. LATS.dm. MAST205.m} 

YROSSC Ser/ntr protein kinase with similarity to 5cftto8ocf)aroni)ces ptmiteC^ protein kinase 
YNUftlW/N1727 Putative Ser/Thr protein kinase of unknown function 
^YIROasc/YBmma Ser/Thr protein kinase Witt) similarity to Y0k^ 



CaMK9Ott^(10iMmbm) 



tiiitfcmlr Ci^ ortmodunw ngulM (Similar to CaMKIJi) 

emi/imM Cs^^-eatmodullrHlependent Ser/Thr protein kinase (CaM kinase), type I 
ein2/9inM/tQIJMC CaM Mnase type U 

MNl/ttiM/VBliSlllir Ser/ntr protein kinase wtth similarity to Onklp, Onk^ and Cmk3p. (Sequence updated) 
mm^lim/W$ftMnM/WM9ni com kinase 

tNn/ANiK^lmRK(S(i^ NPKSjft. PUU.ce) 

IWpmi$7iirj^^^ Ser/nir protein kinase; sImUar to Kln2p and S. pombe KlNt 
Kii4/||S$^i*9^^ protein kinase; similar to Ktnlp and S. pombe KINl 

MiM^raM^^ Ser/ntr protein kinase; similar to Kinlp and Kfn2p; catetytic domain is most similar to Snflp 

YMJitiCi^frSer/ntr protein kinase with smtilarlty to KIn4p 
YniiMI/m07 Ser/ntr prote^ 



/(Similar to SNF1.5C) 

ttM/MTlftja/YDRmc Ser/mr protein kinase wiUi similanty to Ycl024p; growtit inhibitory proton 
VCUMAW Protein wtth simllartty to Snflp 
Htl^Via463/ViajfilW Ser/ntr prateh kinase that 

Other CMK tair (StmOv to Z71478_sp, MlCKjdd. (MAKUt) 
inviMfeVOiSaT/VOinBlC Stf/nv protein kinase requM 



PM>tt/0IW I70/YW1 01C Protein kinase neoessary for induction of RnrSp and DNA repair genes after ONA damagp; contains FHA domain 
*YMR2liW/<NV8SOI48 Ser/Thr protein kinase of unknown function 



CMee«oi9(tt«WQbm) 



€OICM(^itriteto^^^C^ 

Wsm/eqns/IUmvmi^^ CycHrKlepenaem protein ktnro (CDK) essential fbr completion of START and fbr mitosis; 

8sucUteS;)Mtii ;Cktlp arxl pyclins. PS1AIRE In kinase subdomain 111 
PHOevraojU^^/VPUmc <X)K that btteracte wtth ^ 
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Table 8. ClassiRcation of Saccharomyees cerevlslae protein kinases (contd) 



CDK family contd 

CAK1/CIV1/YFIj029C CDtCactivating kinase (Ser/Thr protein kinase) responsible for in vivo activation of Cdc28p. PHNAKFE in kinase 
subdomain III 

SSN3/UME5/SR1U0/<ARE1)/P7102.08/YPL042C Ser/Thr CDK of the RNA polymerase 11 holoenzyme complex and mediator (SRB) subcomplex. 

SQSACRE in kinase subdomaUi ill. TLY in kinase subomain Vlli 
KIN28/0RF2330/YD1108W Ser/Ptf CDK component of trans<^ption Initiation factor TRIH; phosphorylates carboxy^erminal domain (CTD) of 
RNA polymerase large subunit. DMSAiRE in kinase subdomain III 

MAPK ftoiily (Simitar to SPKl.sp. ERK.h) 

KSS1/G4149/YGR040W Ser/Hir protein kinase; redundant with Fus3p for induction of mating-specific genes by mating pheromone, TEY in kinase 
subdomain VIII (activalion loop) 

FUS3/DAC2/YBL0303/YB1,03.21/YBL016W Ser/Thr protein kinase required for cell-cycle arrest and for ceil fusion during mating. TEY in 
activation loop 

H0GV5SK3/Ld3S4.2/U931/YLR113W Ser/Thr protein kinase; involved in higivosmolarity signal transduction pathway. TGY in activation loop 
S1T2/MPK1/SLK2I/BYC2/YHR030C Ser/Thr protein kinase involved in the cell wall integrity pathway. TEY in activation loop 
YKL161CAKLG15 Ser/Thr protein kinase of unknown function. KGY in activation loop 

SMKt/YP9499.10/YPR04%W Sporulatiorv-spectfic MAP kinase required for completion of sporulation. TNY in activation loop 

6SK3faraDy (Similar to SHAGGY.dm.GSK3_h) . , • . 

MCK1/(YPK1)/N0392/YNL307C Ser/Thf/Tyr protein kinase (meiosis and centromere regulatory kinase); positive regulator of melosis and spore 
formation 

YOL128C/00530/ORF1209713 Ser/Thr protein kinase of unknown function 
MDSl/RiMll/6SK3/YM9376.08AMR139W Ser/Thr protein kinase: homologue of mammalian GSK3 
MRK1/D24S9/D2461/YDLD7SC Ser/Thr protein kinase with similarity to Mdslp 

CLK family 

KNS1/U224AU.019C Ser/Thr protein kinase of unknown function. (Simitar to CU^ti) 

YAK1/YJU41C Ser/Thr protein kinase that suppresses loss of Tpklp + Tpk2p + TpkSp. (Similar to KA23.sp, MNB.dm, MNB Ji) ^„ ^ 
VMR216C/YM8261^0 Putative Ser/Thr protein kinase: has similarity to Cdc31p. (Similar to DSKl.sp, SRPKl.ce. U52111.h). (Not in YPD lisbrig) 
IME2/$ME1/J0817/YJL106W Ser/Thr protein kinase and positive regulator of sporulaUon genes essential for iniUation of meiosis. TAY in activation 
loop. (Similar to MA»Lr, p34_h, CDK2Ji) 

Ottier CMGC femilly (Similar to PITSLRE.h. CHED.h) ^ . „ orT^r^oc i„ wn^ca 

SGV1/6UR1/P9584.8/YPR161C Ser/Thr protein kinase involved in pheromone adaptation pathway and in cell cycle. PlTAgRt in Kinase 

CTO1/YKU39W CTD kinase a subunit; CDK that phosphorylates CTD of RNA polymerase II large subunit. PITSIRE In kinase subdomain til 



STE11/5TE20 group (10 ivtembdrs) 



^'^'SSSSS^SvS^^ Of me Pheromone pathway and a pathw^ regu.aUng pseudohypha. deve.opn^nt 

BCKl/(SlKl)/SSP3ViAS3/SAP3/J0906AJl09SW Ser/Thr protein kinase; involved in the cell wall integrity pathway 
SSK2/N3276/YNR031C MAP kinase kinase kinase (MEKK) of the high osmolarity signal transduction pathway 
$SK22ACR073C MEKK with sUong sinrtllarity to Ssk2p; participates in the high osmolarity signal transduction pathway 

STE20/PAKftoiIIy (Similar to PAMm,PAKl^h.PAK65_h.RAC.h) u . 

ctS^UWC Ser/Thr protein kinase In the pheromone pathway; also participates in pathway regulabng pseudohyphal development 
CUl4/ERC10/ri04SO/YNU)450/YNL298W Ser/Thr protein kinase required for cytokinesis; has similarity to Ste2Gp 
Y0L113W/HRA659/00722 Ser/TTir protein kinase with similarity to Ste20p 

NRK/MESS tadly (Similar to MESSUn, ZC5044_ce) . . 

NRKl/iaCl/HB283.14/YHR102W Ser/Thr protein kinase that Interacts with Cdc31p 
SP$l/l>9719^7/YDRfi23C Ser/Thr protein kinase involved in middle/late stage of meiosis 

*W^/YAR019C Protein kinase of the MAP kinase kinase kinase family essential for late nuclear division. (Similar to MES5l.m, 
CDC7.SP. MSTlJi) 



STE7/MEK group (8 members) 



"^iESdS!^ kinase of MAP Wnase kUiase (MEK) family; component of the pheromone pathway and e pathway 

mS^S^mljS Ser/Thr/Tyr protein Wnase of the MEK family involved In ceil wall integrity pathway 
SKS{^^^ kinase of the MEK family Involved In cell wall integrity pathway, (Sequence updated) 
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Dendrogram of the budding yeast protein kinase superfamily. The catalytic domains of 113 yeast protein kinases were aligned using SAM^^, 
a multiple sequence alignment program that applies a linear hidden Markov model to facilitate recognition of conserved subdomains within 
a protein family (http://vvww.cse.ucsc.edu/research/compbio/sam.html). The SAM alignment was run on a MasPar parallel computer and 
the results were inputted into PROTPARS, a protein sequence parsimony method, to build an unrooted phylogeny. PROTPARS is part of the 
PHYUP package written by J. Felsenstein of the University of Washington (http://evolution.genetics.washington.edu/phylip.html). 



that Ste20 and Cla4 share a function in 
this process. Nevertheless, the double 
ste20/da4 mutant can still assemble 
actin, a process regulated by Cdc42 and 
other Rho faniily members. This could 
be accounted for by Y0L113W. a third 
member of the Ste20 family whose exist- 
ence has been revealed by the genome 
sequencing project; it will obviously be 
important to analyse the hmction of 



Y0L113W and determine to what extent 
its function is redundant with those of 
Ste20 and CIa4. Like the other two 
family members, Y0L113W has both a 
phosphoiipid-binding pleckstrin hom- 
ology (PH) domain and a Rac/Cdc42- 
bindhig motif (SxPx^HxxH) upstream 
of its catalytic domain. 

The hmction of Ste20 subfamily pro- 
tein kinases is of significant interest, 



because it expanded greatly during evo- 
lution and there are a large number of 
Ste20-related protein kinases in verte- 
brates, such as the Paks, which are 
regulated by Cdc42 and other members 
of the Cdc42 GTPase family. 

Celkycle control 

The regulation of the cell cycle in- 
volves many types of protein kinase. In 
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budding yeast, Cdc28 is the only known 
CDK with an essential role in cell-cycle 
regulation, althougli two other protein 
kinases, Pho85 and Kin28. bind and are 
activated by cyciins and might have 
roles in cell-cycle progression*^ Pho85 
can play a direct role in Gl regulation, 
whereas Kin28 has an indirect role as a 
component of the basal transcription 
factor TFIIH. All three of these yeast 
protein kinases contain a canonical AIRE 
sequence in catalytic subdomain HI, 
which is part of the al-heiix that inter- 
acts through the conserved lie with the 
cyctin subunit. 

An additional member of the CDK clus- 
ter. Ssn3/Srbl0. has no apparent role in 
celkycle regulation, but forms a complex 
with cyclin Srbll as part of the larger 
RNA polymerase 11 holoenzyme*^. Ctkl 
is a divergent CDK identified as an RNA 
polymerase II carboxy-termlnal domain 
kinase, which binds Ctk2. a cyclin-related 
proteln*^ and contains SIRE instead of 
AIRE. The genome sequence revealed a 
number of new cycllns. but only one new 
CDK-related gene. CAKl/aVl/YFL029C. 
This gene Is unlikely to encode a true 
CDK, because It has an AKFE sequence 
instead of the AIRE motif; indeed CAKl/ 
CIV1/YFL029C has recently been shown 
to be active as a monomer and to be a 
CDK-activating kinase (CAK) that phos 
phorylates Thr In the activation loop ci' 
Cdc28 (Refs 14-16). a function that In 
mammalian cells is currently thought to 
be carried out by cyclin H/Cdk7. the 
CDK subfamily evidently underwent sig- 
nificant expansion during the evolution 
of the multicellular eukaroytes, as in 
vertebrates at least four CDKs are di- 
rectly Involved In celkycle regulation 
and eight CDKs are known altogether. 

Another cell-cycle regulatory protein 
kinase, NIMA, which is required for the 
G2-M transition In Aspergillus, has so 
far been only identified in filamentous 
fungi*^ Given the highly conserved na- 
ture of celkycle regulation, one might 
anticipate that other eukaryotes would 
have NIMA homologues. and some evi- 
dence for a NIMA response pathway in 
vertebrates has been obtalned'^^l How- 
ever, no true homologues have been 
identified in other eukaryotes, although 
there are several NlMA-related protein 
kinases fn vertebrates. The yeast Kin3/ 
Npkl protein kinase Is quite closely 
related to NIMA In its catalytic do- 
main^-^*, but lacks the carboxy-termlnal 
regulatory domain, which is critical for 
NIMA celkycle function. Thus, it ap- 
pears that there might be no budding 
yeast homologue of NIMA. 
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There are several solitary yeast cell- 
cycle regulatory protein kinases. Includ- 
ing Cdc5. Cdc7 and CdclS. Cdc5 Is re- 
quired for exit from mitosis^^, and its 
three counterparts in higher vertebrates, 
Plk, Fnk/Prk and Sak. form a subfamily. 
Plk, Polo (a Drosophda relative) and 
PIol (a fission yeast relative), like Cdc5. 
have been implicated in progression 
through mitosis. However. Cdc7, a pro- 
tein kinase in the casein kinase 0 family, 
which Is required for Initiation of DNA 
synthesis during S phase^ and is related 
to Hskl from fission yeast, has no known 
mammalian counterpart. CdclS, which 
is essential for completion of mitosis^'*, 
also has no known mammalian homo- 
logue; it is related to human MESSl in 
the cata-lytic domain, but has a long 
dissimilar carboxy-termlnal tail. 

Diversity of yeast protein kinases 

Most of the main vertebrate subfam- 
ilies of protein kinases are represented 
in yeast. For example, in the AGC group 
there are multiple cAMP-dependent 
protein kinases (PKAs), a single protein 
kinase C (PKC). and TOkDa S6 kinase- 
related protein kinases, but no protein 
kinases closely related to cGMP-depend- 
ent protein kinase, Rsk or ^ARK. In the 
CaMK group, there are Ca^*-calmodulin- 
regulated and AMP-dependent protein 
kinases, but no true MLCK, perhaps be- 
cause myosin-based motility is limited 
in yeast. In the CMGC group, all the 
main subfamilies are represented in 
yeast including CDKs. MAPKs. and GSK3- 
and Clk-related protein kinases. 

New groups. As this Is the first analysis 
of all the protein kinases present in a 
complete eukaryotic genome. It is ap- 
propriate to designate two new groups 
that are conserved In all eukaryotes, 
namely the STE11/STE20 group includ- 
ing the MAPKKKs and the STE7 group 
including the MAPKKs. In addition, sev- 
eral new subfamilies can be established 
In the 'Other group*, including casein 
kinase 1 and II. the Ran subfamily and 
two subfamilies that appear to be 
unique to budding yeast. Npr/HalS and 
Elm subfamilies. Notably absent, how- 
ever, are receptor-type protein kinases, 
and Raf-related protein kinases, which 
are MAPKKKs that act downstream of 
receptor protein-tyrosine kinases. Al- 
though yeast has several STEll/ MEKK- 
famlly MAPKKKs, the absence of Raf- 
related MAPKKKs and receptor protein 
kinases in general might reflect the fact 
that yeast has little need for intercellu- 
lar conununicallon, other than to re- 
spond to mating pheromones. which is 



accomplished by G protein-coupled re- 
ceptors. Many of the protein kinase sut>- 
families have undergone significant ex- 
pansion during evolution; for example, 
there is only one PKC in yeast, but at 
least nine in vertebrates. 

No true proteliv-tyroslne kinases. As antici- 
pated from many unfruitful sequence- 
based searches, budding yeast has no 
members of the true protein-tyrosine 
kinase family. The progenitor for this 
family probably arose when multicellu- 
lar organisms evolved. The driving force 
behind the evolution of protein-tyrosine 
kinases was presumably the need for 
a signaling mechanism for cell-cell 
communication within a multicellular 
organism. The concomiUnt evolution of 
phosphotyroslne-binding domains that 
could mediate protein-tyrosine kinase 
signal-dependent protein-protein inter- 
actions must also have been a critical 
event. 

The absence of typical protein-tyrosine 
kinases, however, does not mean that 
enzymes of this specificity are completely 
lacking, and there are several examples 
of what have been termed dedicated 
protein-tyrosine kinases. For example. 
Swel, a member of the Weel family, phos- 
phorylales Tyrl9 in Cdc28, negatively 
regulating the activity of this CDK^. 
Other Weel family members can auto- 
phosphorylate on serine, threonine and 
tyrosine, and as a result are commonly 
known as dual-speclficlty protein kinases. 
Although there are no other Swel- 
related protein kinases in yeast that could 
regulate CDK function, there are two 
protein kinases, Spkl/Rad53/Mec2/Sadl 
and Mpsl/Rpkl. involved in S and G2 
checkpoint control and spindle pole 
body duplication respectively. These ki- 
nases are also d*ial-speclficlty protein 
kinases that, at least in vitro, have pro- 
tein-tyrosine kinase activity^-^T yeast 
also has true dual-specificity protein 
kinases In the MAPKK family, which 
phosphorylate the Thr and Tyr in a TxY 
motif in the activation loop of members 
of the MAP kinase family. The existence 
of three bona fide protein-tyrosine 
phosphatases in yeast underscores the 
importance of protein-tyrosine phos- 
phorylation in this organism. 

All the protein kinases classified in 
the proteln4yrosine kinase family based 
on sequence analysis have experimen- 
tally verified tyrosine phosphorylating 
specificity. However, It Is not yet known 
exactly how tyrosine, rather than serine 
or threonine, is selected for phosphoryl- 
ation, despite the availability of several 
proteln-serine and protein-tyrosine kinase 



TIBS 22 - JANUARY 1997 

cataiyticKlomain threedimensioflal struc- 
tures. This means that one cannot ex- 
clude the possibility that a novel solitary 
protein kinase is not a protein4yrosine 
kinase until it is tested biochemically. 
Indeed, mammals have a novel protein- 
tyrosine kinase, A6, which is totally un- 
related in sequence to the conventional 
protein-tyrosine kinase family^*. There 
is an A6-reIated gene. YGR080W, in 
yeast* and additional homologues in the 
human EST database. It will be interest- 
ing to test whether these genes encode 
protein-tyroslne kinases. 

Protein kinases found only In yeast 

Some subfamilies of yeast protein 
kinases so far appear to be unique 
to budding yeast. These include a PK/- 
related family (YOLIOOW, YDR490C, 
YDR466W), a Nek-like family (YNL020, 
YIL095W, YBR059C) and the Npr/Hal5 
family, which has nine members divided 
into two groups related to Hal5, a pro- 
tein kinase involved In salt and pH tol- 
erance, and Nprl. a protein kinase In- 
volved in regulating transport systems 
for nitrogen nutrients under conditions 
of catabolite derepression^^. If one were 
to predict what sort of protein kinases 
might be unique to yeast, those in- 
volved in nutrient uptake or resistance 
to environmental stress would be obvi- 
ous candidates. The same might be true 
for the Elm family, where the epony- 
mous protein kinase Elml is involved in 
pseudohyphal growth^*^, a process that 
Is unique to yeast. There are also a 
number of yeast protein kinases that 
have homologues In other species, but 
not In vertebrates; for instance the Ran 
family contains protein kinases closest 
to the fission yeast Rani protein kinase. 

There are currently four yeast protein 
kinases that have no known homologues 
in other species (Bubl. YKL171W, 
YGR052W and YPR106W). However, 
although proton kinases in these families 
have not been identified in vertebrates 
so far, several of the yeast protein 
kinases are close relatives of Caenorhab- 
ditis elegans protein kinases, where the 
genome sequence is now nearly 60% 
complete. Thus, as the human genome 
sequence progresses, one can antici- 
pate that many of the apparently 
unique yeast protein kinases will prove 
to have vertebrate homologues. 

NmKonventtonal protein kinases 

In addition to genes In the conven- 
tional protein kinase superfamlly. there 
is also a single gene, that encodes a 
protein In the prokaryotic *hlstidine' 



protein kinase family, Slnl (Ref. 31). 
Such proteins autophosphorylate on a 
histidine residue in response to a spe- 
cific stimulus, and then this phosphate 
is transferred to an acceptor signal re- 
sponse protein on an aspartate residue. 
Given the frequency of such protein ki- 
nases in prokaryotes. where more than 
ten are known in Escherichia coli, it is 
surprising that there are not more pro- 
tein kinases of this type in yeast. It is 
interesting, however, that the Slnl histi- 
dine kinase plays a role In the response 
of yeast cells to osmotic pressure, a re- 
sponse that it is transduced In bacterial 
ceils by the histidine kinase, EnvZ^^ 
The YIL042C protein is related to the 
mitochondrial branched chain a-ketoacid 
and pyruvate dehydrogenase protein ki- 
nases, which are unusual proteln^erine 
kinases that are recognizable members of 
the 'histidine* protein kinase famiiy^^*^. 

There are a number of eukaryotic 
proteins with bona fide protein kinase 
activity, such as the Dictyostelium 
myosin heavy chain protein kinase^, 
that are structurally unrelated to either 
the eukaryotic protein kinase superfam- 
lly or the prokaryotic signal response 
protein kinases. Well over 60% of the 
genes in the yeast genome are of un- 
known function, and it Is certainly poss- 
ible that some of them will be uncon- 
ventional protein kinases, such as the 
AG-related protein discussed above. 

The yeast genome has ten genes ir 
the lipid kinase family, which have a dr • 
main related to the catalytic domain >f 
the protein kinase superfamlly Some of 
these are bona fide phosphatldylinos :ol 
kinases (e.g, Vps34, Plkl, eic.J, whrieas 
others have not been found to have 
lipid kinase activity. Given that DMA PK, 
a mammalian protein In this family. Is a 
genuine protein kinase whose activity Is 
stimulated by double^tranded DMA ends, 
there has been speculation that some of 
the members of the lipid Kinase family 
are In fact protein kinases^. Indeed, It 
appears likely that the members of this 
family that are Involved In checkpoint 
function, such as Mecl and Tell, are 
protein kinases. Moreover, another lipid 
kinase subfamily, which Includes the 
rapamycin-binding proteins, Tori and 
Tor2, Is known to autophosphor^te, 
and It Is likely that these are also pro- 
teh. kinases, although their salient sub- 
strates remain to be identified. 

PerspecUves 

If we include all the different types of 
prx)teln kinase encoded by the yeast 
genome, we reach a total of -120; a 




number that is a tittle lower than the 
most recent estimate of the number of 
yeast protein kinases, which was based 
on the sequencing of chromosome Ul 
(Ref. 36). Recent analysis of GenBank 
and unfinished sequence databases for 
C elegons (kindly provided by the Sanger 
Centre, Cambridge. UK) so far reveals 270 
unique protein kinases (G. D. Plowman, 
unpublished). However, although these 
data represent approximately 60% of 
the 100Mb nematode genome, which is 
about eight times larger than that of 
yeast, the number of kinases per kb of 
DNA Is only half that predicted from the 
analysis of the yeast genome. However, 
the number and length of introns In 
higher eukaryotes is much greater than 
in yeast, which decreases the percent- 
age of coding regions. 

By extrapolation, we can estimate the 
number of protein kinase genes In mam- 
mals, which have about four times as 
many genes as C. elegans. Based on this 
estimation, a prediction of more than 
1000 protein kinase genes in the human 
genome still seems a reasonable one^', 
particularly if one takes into account the 
expansion of the protein-tyroslne ki- 
nases used for intercellular signalling In 
higher organisms. By the time the hu- 
man genome project Is completed we will 
know how accurate this estimate was. 
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Protein architecture, dynamics 
and allostery in tryptophan 
synthase channeling 

Peng Pan, Eilika Woehl and Michael F, Dunn 

The aj^2 ^^^^ *® tryptophan synthase bienzyme complex catalyses the 
last two steps in the synthesis of L-tryptophan, consecutive proc^^ses that 
depend on the channeling of the common metabolite, Indole, between the 
sites of the a- and ^subunits through a 25A-long tunnel. The channeling 
of indole and the coupling of the activities of the two sites are controlled by 
altosteric signals derived from covaient transformations at the p-site that 
switch the enzyme between an open, low-activity state, to which iigands 
bind, and a closed, high-activity state, which prevents the escape of indole. 



THE PHENOMENON OF direct metab- 
olite transfer between sequential en- 
zyme pairs In a metabolic cycle Is 
classified as substrate channeling^ The 
tryptophan synthases from enteric bac- 
teria, with subunit composition o^p^* 
are the best<haracterized examples of 
substrate<:hannellng, multlenzyme com- 
plexes^l These enzymes catalyse the last 
two steps (Rg. 1) in the biosynthesis 
of L-tryptophan (L-Trp). The a-subunlt 
has an (a/P)g-barrel folded motif, and 
catalyses the cleavage of 3-lndole^ 
glycerol 3'-phosphate QGP) to iuiJole 
and i>glyceraldehyde^'-phosphate (G3P). 
The ^-subunit is a pyrldoxal phosphate- 
requiring enzyme that catalyses the 
conversion of L-serine (L-Ser) and indole 
to L-Trp and a water molecule^. The 
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p-reaction occurs in two stages; in 
Stage I, L-Ser reacts with enzyme-bound 
pyrldoxal 5'-phosphate (PLP) to form 
the quasi-stable a-amlnoacrylate inter- 
mediate, E(A-A), the species poised for 
reaction with Indole; in Stage 11, Indole 
reacts with E(A-A) to form L-Trp (Fig. 1). 
Efficiency Is achieved by channeling the 
product of the first enzyme Qndole) to 
the second enzyme from the a-site to 
the p^lte through a 25A-long tunnel 
(Fig. 2). 

Here, we review recent findings 
showing that. In the overall catalytic 
cycle, p-site covaient reactions with 
substrate trigger allosteric signals that 
flip the enzyme between open Oow ac- 
tivity) and dosed (high activity) confor- 
mations. These events serve two func- 
tions: (1) conversion to the closed state 
prevents the escape of Indole, and 
(2) switching between activity states 
couples the catalytic cycles of the two 
enzymes. 



Physical and dynamic constraints are 
dictated by function 

The elegant crystallographic work of 
Hyde et aL^ and Rhee et aL^ on the 
Salmonella typhimurium tryptophan 
synthase bienzyme complex (ajPj) has 
contributed significant advances to our 
icnowledge of structure-function relation- 
ships in substrate channeling by multi- 
enzyme complexes. Their efforts have 
provided the first example showing 
the three-dimensional structure of the 
molecular machinery required for chan- 
neling in a stable multlenzyme complex. 
Out of this structural work and from re- 
cent mechanistic studies^*^ there has 
emerged the realization that, for effi- 
cient substrate channeling to occur be- 
tween enzyme pairs, a rather stringent 
set of physical and dynamic constraints 
must be met. Those evident in the tryp- 
tophan synthase example so far are as 
follows: (I) The architecture of the 
multlenzyme complex must provide a 
physical structure with dynamic prop- 
erties that constrain the degrees of free- 
dom of the conunon metabolite so that 
transfer from one site to the next is 
assured. (2) Catalysis at the two active 
sites must be coupled such that turn- 
over at each site occurs in phase. 

The structure'* of ajPg partially ex- 
plains how the first constraint is satis- 
fled; the a- and p^ltes of each heter- 
ologous dimer are connected by a 
25 A-long tunnel (Rg. 2). The kinetic stud- 
ies of Dunn et ai^. Lane and Kirschner^ 
and Anderson et al.^ have established 
that the tunnel actually functions as 
the conduit for the transfer of indole 
between sites. 

An InterconnecUng tunnel Is Insufficient to 
ensure channeling 

Two additional criteria must be met to 
achieve efficient phasing of the vectoral 
transfer of indole with the p-site chem- 
istry (Rg. 1) so that L-Trp is efficiently 
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Contributed by Tony Hunter, September 7, 1999 

Caenorhabditis elegans should soon be the first multicellular or- 
ganism whose complete genomic sequence has been determined. 
This achievement provides a unique opportunity for a comprehen- 
sive assessment of the signal transduction molecules required for 
the existence of a multicellular animal. Although the worm C 
elegans may not much resemble humans, the molecules that 
regulate signal transduction in these two organisms prove to be 
quite similar. We focus here on the content and diversity of protein 
kinases present in worms, together with an assessment of other 
classes of proteins that regulate protein phosphorylation. By sys- 
tematic analysis of the 19,099 predicted C. elegans proteins, and 
thorough analysis of the finished and unfinished genomic se- 
quences, we have identified 411 full length protein kinases and 21 
partial kinase fragments. We also describe 82 additional proteins 
that are predicted to be structurally similar to conventional protein 
kinases even though they share minimal primary sequence iden- 
tity. Finally, the richness of phosphorylation-dependent signaling 
pathways in worms Is further supported with the identification of 
185 protein phosphatases and 128 phosphoprotein-binding do- 
mains (SH2, PTB, STYX, SBF, 14-3-3, FHA, and WW) in the worm 
genome. 

Reversible protein phosphorylation plays a central role in 
regulating basic functions of all eukaryotes such as DNA 
replication, cell cycle control, gene transcription, protein trans- 
lation, and energy metabolism. Protein phosphorylation is also 
required for more advanced functions in higher eukaryotes such 
as cell, organ, and limb differentiation, cell survival, synaptic 
transmission, cell-substratum and cell-cell communication, and 
to mediate complex interactions with the external environment. 
Because aberrant protein phosphorylation is commonly the 
cause of cancer and other human diseases, a comprehensive 
knowledge of the key enzymes that regulate these functions can 
provide the basis for novel therapeutic intervention strategies. 

The genomic revolution promises to provide a new paradigm for 
drug discovery, allowing one to selectively target the molecular 
basis of human disease. The completion of the Caenorhabditis 
elegans genome sequence gives us an opportunity to decipher the 
molecular nature of its signal transduction machinery. Several 
global analyses of proteins and protein domains present in C 
elegans have been presented elsewhere (1-4), revealing that protein 
kinases comprise the second largest family of protein domains in 
worms. The three most frequently occurring protein domains found 
in worms are seven transmembrane chemoreceptors (650 domains, 
3.5% of genome), protein kinases (496 domains, 2.6% of genome), 
and zinc finger C4 domains, including nuclear hormone receptors 
(275 domains, 1.4% of genome). A more in-depth analysis has been 
performed on the 535 worm proteins containing zinc-binding 



domains, including the C4, C2H2, and C3HC4 ring finger types (3), 
and on the 83 worm homeobox transcription factors (4). Here, we 
present a comparative analysis of the enzymes and adaptor mole- 
cules that are the key components of the protein phosphorylation 
signaling network present in C elegans. 

Identification and Classification of C. elegans Protein Kinases. To 

identify worm protein kinases, we first used an hmmer 2.1.1 (http:// 
hmmer.wustl.edu/) profile search against the 19,099 predicted 
worm proteins, the finished and unfinished C. elegans genomic 
sequence, and the worm chromosome assemblies. The nucleic acid 
databases were first translated in all six frames, and ORFs longer 
than 30 amino acids were parsed into a relational database. We 
generated a hidden Markov model based on 70 representative yeast 
and human protein kinases whose catalytic domains share <50% 
sequence identity with each other (5). Using a similar strategy, 
additional profiles were generated for other protein kinase-like 
domains (phosphoinositide kinases, atypical A6 kinases, diacylglyc- 
erol kinases, aminoglycoside resistance kinases, and microbial 
kinases), protein phosphatases, and domains capable of specifically 
binding to phosphotyrosine (P.Tyr) or phosphoserine/threonine 
residues (SH2, PTB, STYX, SBF, 14-3-3, FHA, and WW domains). 
Scripts were written for reassembly of contiguous exons identified 
from genomic sequence to generate the predicted catalytic domain 
sequence of each kinase. Pairwise blast 2.0 (ftp://ncbi.nlm.nih.gov/ 
blast/executables/) analysis was performed to identify redundant 
entries, and putative protein kinases with low profile scores were 
manually inspected to determine whether they should be included 
in subsequent analyses. 

This analysis generated a nonredundant list of 493 protein 
kinase-like proteins and 21 protein kinase gene fragments from 
worms. This number will continue to increase as the genome is 
completed and the final assembly of the six worm chromosomes is 
achieved. Of note, we found >40 kinase domains from genomic 
analysis that were absent in the 19,099 worm protein dataset. These 
omissions result from the limitations of current protein prediction 
algorithms. Furthermore, numerous entries had apparent internal 
deletions of conserved kinase motifs, likely attributable to inap- 
propriately assigned splice junctions. These sequences were cor- 
rected before further classification. Many of the 19,099 proteins 
were alternate isoforms of the same gene, in which case we included 



Abbrevations: PKA protein kinase A; MAPK, mitogen -activated protein kinase; CDK, 
cycl in-dependent kinase; PTK. protein-tyrosine kinase; RTK. receptor protein-tyrosine ki- 
nases; CTK. cytoplasmic protein-tyrosine kinases; STAT, signal transducer and activator of 
transcription; IRS, insulin receptor substrate; NLK. NEMO-like kinase; APH, aminoglycoside 
phosphotransferases; PTP, protein-tyrosine phosphatase. 
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Fig. 1. Hyperbolic tree representation of C. elegans protein kinases. Major 
protein kinase groups are labeled in different colors. A java tool for viewing this 
dendrogram can be found at www.kinase.com. 



only one of the proteins in our final assessment. In determining the 
total number of protein kinases, the three proteins determined to 
contain dual catalytic domains were only counted once. Many of the 
protein ORFs truncated the extremities of the kinase domain 
proteins, frequently because of their location near the end of a 
cosmid clone. In these cases, we searched for N- or C-terminal 
domains on adjacent cosmids to assist in the subsequent classifica- 
tion. One challenge of genomic data mining is the presence of 
sequence repeats. Tandem repeats and inverted repeats account for 
2.7 and 3.6% of the worm genome, respectively. In addition, worms 
contain large regions of tandem gene duplication, ranging from 
hundreds of bases to >100,000 bases (1). In some cases, the genes 
encoded within these regions are duplicated and have nearly 
identical sequences. Therefore, until the chromosome sequences 
are fully assembled, data-mining approaches may exclude some of 
these duplicated genes. 

A multiple sequence alignment was generated from the predicted 
catalytic domains of 398 of these protein kinase, which share >15% 
amino acid identity with other entries. The aligned proteins were 
then clustered by using parsimony analysis, and the results were 
displayed as rooted and unrooted cluster dendrograms, and as 
kinase "retinograms" or hyperbolic trees using a JAVA display tool 
(Fig. 1 and www.kinase.comJ. The protein kinases were then 
classified into several kinase groups and families, based on relat- 
edness within the kinase catalytic domain to other worm, yeast, and 
vertebrate protein kinases. Further classification was performed by 
searching for noncatalytic domains linked to the kinase domain, 
including predicted transmembrane regions, SH2 domains and SH3 
domains, and Ig and fibronectin Type III domains. 

Table 1 presents a summary of our classification of the 411 
protein kinases and 82 protein kinase-like motifs. A more detailed 
table of these proteins, along with basic informatics tools for 
retrieval and alignment of these sequences can be found on our web 
site at www.kinase.com. Table 1 also summarizes the results of a 
similar analysis of the completed yeast genome and of an ongoing 
effort from publicly available human expressed sequence tag and 
genomic databases. From this classification, we can now determine 
which protein kinases are conserved between yeast and worms, we 
can speculate on the origin of the protein kinase superfamily, and 
we can identify kinases that are yeast-specific and those that are 
restricted to higher eukaryotes. We tentatively identify "worm- 
specific" protein kinases, based on their absence from current 



Table 1. Summary and classification of phosphoprotein signaling 
molecules in worms, budding yeast, and humans 
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CKI 
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CMGC 


42 


0 
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163 
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28 
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100 
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2 


8 


YLK1 


30 
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4 
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14 


0 
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DSP 
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16 


51 


SIP 


65 


0 


18 


21 


IPP 


11 


0 


7 


7 
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185 


18 


44 


126 


SH2 


73 


1 


1 


>137 


PTB 


16 


0 


0 


>47 


STYX (DSP) 


1 


0 


5 


2 


SBF (MTM) 


2 


0 


0 


3 


14-3-3 


3 


0 


2 


>6 


FHA 


11 


0 


14 


>20 


WW 


22 


0 


5 


>32 


All 


128 


1 


27 


>247 


Cydins 


34 


0 


23 


>21 



mammalian expressed sequence tag and nucleic acid databases. 
However, a final assessment will have to await completion of the 
Drosophila and human genome sequences. We also elaborate on 
some of the protein kinases and signaling pathways that evolution- 
arily appear only in more complex organisms such as vertebrates. 

In this review, we use the term "orthologues" to refer to proteins 
of different species that are believed to have a common ancestor 
and have an evolutionarily conserved function. Orthologous pro- 
teins typically have similar domain structure and share extended 
sequence similarity outside of their catalytic domains. Homologous 
proteins also share extended sequence similarity, but to a lesser 
degree than orthologues, and are not expected to complement one 
another functionally. However, within large protein superfamilies 
such as protein kinases, G protein coupled receptors, and nuclear 
hormone receptors, there is not a single expectation value that can 
be used to categorize all members definitively, and final classifica- 
tion will require experimental validation. 

Yeast- and Fungal*Specific Kinases. The first complete eukaryote 
sequence, that of the budding yeast Saccharvmyces cerevisiae, was 
reported in 1996 (6). Shortly thereafter, we presented a compre- 
hensive analysis and classification of yeast protein kinases (7). Now, 
with the availability of a second eukaryotic genome, C. elegans, we 
can perform a similar analysis and make more informed general- 
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izations on which of these protein kinases are unique to yeast or 
fungi, and also on which protein kinases evolved during the 
emergence of multicellular organisms and are therefore not rep- 
resented in yeast or fungi. 

We now identify a total of 24 yeast-specific protein kinases and 
an additional 3 that are currently restricted to yeast and worms. 
Originally we defined four protein kinase subfamilies, containing a 
total of 18 members, to be yeast specific [protein kinase A (PKA)- 
related, RAN, ELM, and NPR/HAL5 families]. These remain 
yeast- or fungal-specific, as no close homologues are present in 
worms, and none have yet been described in vertebrates. However, 
the ELM family could be considered as a subfamily of the CAMK 
group. Riml5 is a yeast-specific kinase that is related to Schizosac- 
charomyces pombe Cekl, and its similarity to budding yeast 
YNL161W places it as a distant member of the NDR family kinases. 
Two other protein kinase subfamilies, containing a total of five 
members, were originally recognized as having only distant homo- 
logues in higher organisms (NEK-like and PIM-like families). The 
prototype of the NEK-like family, YNL020C, has a homologue in 
worms, but not in mammals, although its C-terminal tail has a 
predicted coiled<oil structure related to numerous mammalian 
protein kinases (e.g., SLK/PLKK, TAKl). The two yeast PIM-like 
family members have catalytic domains related to worm and 
mammalian protein kinases, but have a unique N-terminal domain. 

Members of the NPR/HAL5 family are involved in ion ho- 
meostasis, polyamine transport, nutrient uptake, and response to 
nitrogen starvation, whereas Elml initiates a protein kinase cascade 
controlling pseudohyphal growth (8). Members of the RAN family 
are related to fission yeast Ranl/Patl, which regulates the switch 
between vegetative growth and meiosis. Because these are fungal- 
specific responses, it is not surprising that these protein kinases are 
restricted to lower eukaryotes. 

A second set of "unique" yeast protein kinases was originally 
defined because they had no close homologues in other species (7). 
Most of these yeast protein kinases now have both worm and 
vertebrate orthologues (Cdc5, IpU, Irel, Vpsl5, YGL180W/Apgl, 
Swel, Spkl, Gcn2, YBR274W, YGR262C, and Bubl). Exceptions 
among this list of unique yeast protein kinases are YPL236C and 
Mpsl, which have orthologues in humans, but not in worms; 
YKL116C, which is distantly related to the EMK-family, yet has 
only weak homologues in worms and humans; and YICL171W, 
YGR052W, and YPR106W, which remain yeast specific protein 
kinases. Two sequences that were excluded from our previous 
analysis of yeast protein kinases deserve mention. The budding 
yeast protein Iksl can be classified as a yeast-specific protein kinase 
because it still has no homologues in worms or other species 
whereas another yeast kinase-like sequence, SCYl, has orthologues 
in C elegans and Arabidopsis, but none thus far in vertebrates. A S. 
pombe protein, which is distantly related to SCYl, also has a single 
worm orthologue. 

Worm-Specific Protein Kinases. Which protein kinases are specific to 
worms? Protein kinases that are absent from yeast yet present in 
worms are likely to be involved in the complex signal transduction 
pathways that are required for the existence of multicellular or- 
ganisms. These might include protein kinases involved in cell- 
substratum and cell-cell adhesion, transmembrane signaling in 
response to humoral factors, protein kinases involved in cell survival 
or programmed cell death, and protein kinases whose signals 
regulate metazoan-specific transcription factors, particularly those 
containing Zn-finger domains. 

In the absence of complete genome sequences of other multi- 
cellular eukaryotes, we tentatively classify 165 protein kinases (plus 
9 protein kinase fragments) as worm-specific. The majority (134, 
80%) fall into three groups (CKl, PER, and KIN-15) whereas the 
others are distant members of common protein kinase families or 
belong to worm specific subfamilies. Five protein kinase subfami- 
lies, containing a total of 12 members, can tentatively be defined as 
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worm-specific (C04G2.10, K08B4.5, K09C6.7, R107.4, and 
ZK177.2-famiJies). An additional 15 unique worm protein kinases 
are also identified, which to date have no close homologues in yeast, 
worms, or in higher organisms. However, mammalian homologues 
of some of these worm protein kinases are already beginning to 
appear in publicly available expressed sequence tag databases, and 
assignment of a protein kinase as being truly worm-specific will 
have to await the completion of the Drosophila and human genome 
sequences. 

Members of four other protein kinase or kinase-like subfamilies 
are disproportionately represented in worms compared with hu- 
mans. Clusters of 5-9 members of each of these families are 
localized to short regions (<1 megabase) of chromosomes II and 
IV, suggesting they may each have expanded as a result of extensive 
tandem gene duplication. The chromosomal density of protein 
kinases is graphically depicted on our web site at www.kinase.com. 
The four gene families are the CKl-family, the KIN-15-family of 
receptor protein-tyrosine kinases, the FER-family of cytoplasmic 
protein-tyrosine kinases, and the kinase-like domains of the recep- 
tor guanylyl cyclases, 

CKl family. The worm genome contains 87 CKl (casein kinase 
I) members (plus 7 additional partial catalytic domains) whereas 
there are only 4 known members in budding yeast and 6 in humans. 
Genetic evidence from the yeast homologues suggests CKls may be 
involved in DNA repair and cell division, and mammalian CKls 
have been shown to phosphorylate p53 in Gl and G2, possibly 
affecting cell sensitivity to DNA damage at these checkpoints (9). 
Little is known regarding the function of CKls in worms, but the 
enormous arborization and diversification of this kinase family may 
be an adaptation allowing for enhanced DNA repair in response to 
excessive exposure to environmental mutagens. 

KlN-15/16 family. C elegans contains 16 members of a unique 
family of receptor protein-tyrosine kinases whose presence to date 
is restricted to this species. These transmembrane proteins have 
unusually short (<50-aa) extracellular domains, and many are 
clustered within the genome, as though they arose through tandem 
gene duplication. The prototype members of this family, KIN-15 
and KIN-16, are expressed in the hypodermal syncytium, which 
expands by cell fusion during larval development (10). Compared 
with wild-type worms, KIN-15 and KIN-16 deletion mutants pro- 
duce fewer embryos and rarely develop into adults, but, when they 
do mature, they typically exhibit extrusion of the gonads through the 
vulva (11). Therefore, KIN-15/16 appear to be essential genes, yet 
may undergo variable compensation by 1 of the 14 other homo- 
logues. One of the KIN-15 clusters is interspersed with chitinase 
genes, which are known to function in cell wall morphogenesis 
during the molting process and in fungal resistance. Expansion of 
this region may have been necessary during evolution to facilitate 
this aspect of larval development. An alternative function for 
KIN-15-family kinases is suggested by the fact that overexpression 
TKR-1 (C08H9.5) causes a 40-100% extension of life expectancy 
in worms (12). Unlike other life extension (age) mutants, TKR-1 
transgenics do not form dauers, and their longevity has been 
attributed to an increased resistance to ultraviolet and thermal 
stress. 

FER family. The worm genome contains 42 members (plus 2 
additional partial catalytic domains) of the FER-family of single 
SH2-containing cytoplasmic protein-tyrosine kinases. Most of 
these genes are interspersed throughout the worm genome; how- 
ever, nine members reside within a 1.1 -megabase region on chro- 
mosome IV, Unfortunately, no literature is available on the func- 
tion of any of these protein kinases in worms, but the two mam- 
malian homologues, FER and FES, have been demonstrated to play 
a role in cell adhesion, to signal downstream of cytokine receptors, 
and to function as oncogenes (13). Conceivably, additional human 
representatives will be revealed on completion of the human 
genome sequence, possibly with restricted expression. Alterna- 
tively, their function may be replaced in humans by expansion and 
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diversification of non-FER cytoplasmic protein-tyrosine kinases, of 
which worms have only 10 whereas humans have at least 34. Most 
evident is a dramatic expansion of SRC-family kinases and emer- 
gence of ZAP70 and JAK family kinases in higher eukaryotes that 
are not found in the worm genome. 

Conserved Metazoan Protein Kinase Signaling Transduction Pathways. 

Worms provide an elegant model system for studying signal trans- 
duction. This transparent animal is comprised of 959 somatic cells 
plus 131 cells destined for programmed cell death. The C elegans 
hermaphrodite contains 302 neurons and 81 muscle cells and has a 
brain, reproductive system, and digestive tract (ref. 14; http:// 
dauerdigs.biosci.missouri.edu/Dauer-World/Wormintro.html). It 
provides a complex yet tractable system for studying development, 
metabolism, aging, and behavioral responses to a number of stimuli. 
Regulation of many of these processes is carried out through signal 
transduction pathways that are also present in humans. Not sur- 
prisingly, all of the major protein kinase groups found in worms are 
also conserved in humans (15). The number of protein kinases 
classified into each major group from yeast and worms, along with 
a current estimate from humans, is provided in Table 1. These 
numbers represent a current analysis, but new protein kinases are 
being discovered every month as the worm genome sequencing 
project continues. Some of these entries may also represent pseu- 
dogenes containing frameshifts that result in incomplete translation 
into a full kinase catalytic domain. 

AGC Group. The AGC group of worm protein kinases contains 
representatives of many of the known types of cyclic nucleotide- 
dependent, NDR or DBF2, and ribosomal S6 kinase families. 
Worms also contain members of the cGMP-dependent kinase 
(PKG), RSK, and G-protein coupled receptor kinase families that 
are absent from budding yeast. Two of the S6 kinase members have 
dual catalytic domains similar to vertebrate RSK enzymes, where 
the N-terminal domain clusters into the AGC group and the 
C-terminal kinase domain is most related to the CaMK group. 
Worms have four members of the AKT family, two being close 
orthologues of mammalian AKTl /PKB/R ACa, and two related to 
the AKT upstream kinase, PDKl. AKT is a mammalian protoon- 
coprotein regulated by phosphatidylinositol 3-kinase (PI3-K), 
which appears to function as a cell survival signal to protect cells 
from apoptosis (16). Insulin receptor, RAS, PI3-K, and PDKl all 
act as upstream activators of AKT whereas the lipid phosphatase 
PTEN functions as a negative regulator of the PI3-K/AKT pathway 
(17). Downstream targets for AKT-mediated cell survival include 
the proapoptotic factors BAD and Caspase9 and transcription 
factors in the forkhead family, such as DAF-16 in the worm. AKT 
is also an essential mediator in insulin signaling, in part because of 
its use of GSK-3 as another downstream target. Each of these 
components of the AKT/PI3-K pathway is conserved in worms, 
providing a powerful system for genetic dissection of a major cell 
survival signal. 

The cAMP-dependent protein kinases (PKA) consist of het- 
erotetramers comprised of two catalytic (C) and two regulatory (R) 
subunits, in which the R subunits bind to the second messenger 
cAMP, leading to dissociation of the active C subunits from the 
complex. Worms have two PKA catalytic domains and two regu- 
latory subunit genes (R07E4.6 and ZK370.4). Additional cNMP- 
binding domains are present in the two worm representatives of the 
PKG family, in several cNMP-gated ion channels, and in a cAMP- 
regulated guanine nucleotide exchange factor (T20G5.5). 

CaMK Group. In the CaMK group, the most abundant representa- 
tives include Ca^'*'/caImodulin-regulated and AMP-dependent 
protein kinases and EMK-related kinases. Worms also contain 
members of the death-associated protein kinase, mitogen-activated 
protein kinase (MAPK)-associated protein kinase, myosin light 
chain kinase, and phosphorylase kinase families that are absent 



from budding yeast. All of these protein kinase families have likely 
evolved as a result of the demands of multicellularity and the 
emergence of complex organ systems. For example, even though 
yeast have myosin homologues, they lack myosin light chain kinases. 
These protein kinases have presumably evolved to regulate myosin 
during muscle contraction. A worm contig still under construction 
appears to contain a phosphorylase kinase catalytic y subunit 
orthologue, consistent with the presence of two orthologues of the 
noncatalytic phosphorylase kinase a subunits, which facilitate 
calmodulin-binding and are required for activation of the mamma- 
lian holoenzyme. 

Worms lack a homologue of the mammalian Trio-family kinases. 
Trio is a large multidomain protein kinase containing Ras and Rho 
guanine exchange factor domains in addition to PH, SH3, and 
spectrin domains (18). Trio may link Rho and Rac signaling 
pathways and appears to be involved in the cytoskeletal changes 
required for cell migration. Although worms lack a member of this 
kinase family, they do have at least two proteins related to the entire 
noncatalytic domain of Trio (UNC-73 and F55C7.7). 

We have also identified a forkhead homology (FHA) domain- 
containing CHK2 orthologue in worms. In yeast, Spkl/Rad3 
functions as a DNA damage checkpoint sensor through its FHA 
domain interacting with phosphorylated Rad9 (19). Although no 
close orthologue of Spkl exists in metazoans, this function appears 
to be replaced by CHK2/CDS1, which is phosphorylated in re- 
sponse to DNA damage and may work in conjunction with CHKl 
kinase to phosphorylate CDC25C to prevent premature entry into 
mitosis (20). 

CMGC Group. In the CMGC group of serine/threonine kinases, all 
of the main subfamilies are conserved between yeast, worms, and 
mammals, including cyclin-dependent kinase (CDK), MAPK, 
GSK-3, and CUC An exception is the RCK family, which is absent 
from yeast but has two members in worms and at least seven in 
humans. The worm RCK kinases are most similar to mammalian 
MAK, or male germ cell-associated kinase, which has been impli- 
cated in spermatogenic meiosis and in signal transduction pathways 
for sight and smell. Worms have 14 CDKs (compared with 5 CDKs 
in yeast) including orthologues of CDC2, CDK3, CDK5, CDK7, 
and CDKS, and contain 34 cyclins, compared with 23 in budding 
yeast (Table 1), including one cyclin H orthologue, which we predict 
will interact with worm CDK-7 to generate a functional cyclin- 
activated kinase. 

Worms have 14 MAPKs, compared with 6 in yeast and at least 
14 in humans. The worm MAPKs include representatives of each 
of the major types of MAPKs: ERK/MAPK, JNK/SAPKl, p38/ 
SAPK2, BMK/ERK5, and NEMO-like kinase (NLK) (21), In 
budding yeast, three protein kinase families (the prototypes being 
Ste20, Stell, and Ste7) function upstream of the MAPKs to 
generate at least four distinct MAPK signaling pathways that 
mediate the response to pheromone, nutritional starvation, and 
cellular or osmotic stress. In multicellular organisms, these MAPK 
cascades have evolved to mediate responses to diverse signals 
including growth factors, mitogens, hormones, and cytokines, in 
addition to the more primitive stress responses to anoxia, heat 
shock, and osmotic stress. 

STE Group: MAPK Pathways. The STE family refers to the three 
classes of protein kinases that lie sequentially upstream of the 
MAPKs. In worms, this group includes 10 STE7 (MEK or 
M APKK) kinases, 2 STEl 1 (MEKK or M APKKK) kinases, and 12 
STE20 (MEKKK) kinases. Based on the number of MAPK and 
STE-family kinases in C. elegans, we predict worms will contain at 
least 8-10 MAPK pathways. In humans, several protein kinase 
families that bear only distant homology with the STEl 1 family also 
operate at the level of MAPKKKs, including RAF, MLK, TAKl, 
and COT. Except for COT, worms also have orthologues of each 
of these kinases. Because crosstalk takes place between protein 
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Fig. 2, Schematic representation of the human and C elegans receptor protein-tyrosine kinase families. Catalytic domains are shown in yellow. The names of the 
human RTKs are in black, and the names of the worm RTKs are in red. 



kinases functioning at different levels of the MAPK cascade, the 
large number of STE family kinases could translate into an enor- 
mous potential for upstream signal specificity and diversity. 

Protein-Tyrosine Kinase Group: Receptor Protein-Tyrosine Kinases 
(RTKs). Tlie largest group of protein kinases in worms are the 
protein-tyrosine kinases (PTKs), with 92 members and 5 f ragrnents. 
We predict this will also remain the largest group of protein kinases 
in higher eukaryotes, including humans, where the current count is 
«*100. These numbers are impressive when one considers that this 
family is absent from budding yeast. Yeast, however, do have a 
"budding" tyrosine phosphorylation signaling system, with several 
dual-specificity kinases (CLK-like, Ste7/MEK family, Swel, Spkl/ 
Rad53, Mpsl), an atypical A6 PTK, 3 protein-tyrosine phospha- 
tases, 16 dual-specificity and low molecular weight phosphatases, 
and 6 "infant" P.Tyr-binding proteins comprising an apparently 
nonfunctional SH2 domain protein and 5 phosphatase-like STYX 
domains. Budding yeast lack PTB domains, and none of the six 
potential P.Tyr-binding domains have been functionally verified. 

The 92 worm PTKs can be further classified into receptor 
protein-tyrosine kinases (RTKs) and cytoplasmic protein-tyrosine 
kinases (CTKs) based on the presence or absence of a transmem- 
brane domain and SH2 or SH3 domains. Based on this analysis, the 
worm genome contains 40 RTKs and 52 CTKs. The 40 RTKs 
include 16 members of the worm-specific KIN-15-family, 13 RTKs 
with orthologues representing 10 of the 20 families of human RTKs, 
and 11 RTKs that remain unclassified with no identifiable mam- 
malian counterpart (Fig. 2), Genetic studies in worms support the 
classification of five of these worm-human pairs, including LET- 
23/EGF receptor, DAF-2/insuIin receptor, EGL-15/FGF recep- 
tor, CAM-l/RORl receptor, and VAB-l/EPH receptor, and each 
of these orthologous pairs mediates similar functions in worms and 
man, with specificity for epidermal, metabolic, mesodermal, and 
neuronal signaling pathways. 

Based on extracellular domain homologies, we also predict three 
worm orthologues of PDGFR/FLK/VEGFR, two for DDR, and 
one each of RYK, ROS, and LTK/ALK. Two of the unclassified 
RTKs have weak similarity to MET, but not enough to warrant 
inclusion into this family. Missing in C elegans are TRK/nerve 
growth factor receptors, AXL/TYR03, TIE/angiopoietin recep- 



tor, RET/GDNF receptor, and MUSK family members. Identifi- 
cation of three members of the PDGFRA^GFR family is signif- 
icant, as they emerged only through analysis of the genomic data 
and failed to be properly identified from a recent analysis of the 
predicted 19,099 proteins. Each of these receptors contains multiple 
Ig-like extracellular domains and a single split kinase domain with 
closest homology to human FLTl/VEGFRl and the C. elegans 
KIN-15 famUy, However, they are likely to represent early ancestors 
to both the FLK and PDGFR kinase lineages. Expression of the 
mammalian FLK/VEGFR RTKs is primarily restricted to endo- 
thelial cells, and they play important roles in the early differentia- \ 
tion of hematopoietic and endothelial lineages as well as in normal 
and pathologic angiogenesis in the adult. However, because worms 
lack a vasculature, the function of these receptors is not obvious. 
The formation of mammalian vasculature is reminiscent of the 
process by which networks of branching tubes develop into the lung 
and kidneys. Invertebrate VEGFRs may therefore be involved in 
processes that later evolved into a program for limb and organ 
development in vertebrates. 

Surprisingly little is known about how the ligand-activated 
VEGFRs mediate these effects. Gene knockout studies in mice 
suggest that.A-RAF or MEKKl may function downstream of 
VEGFRs, and recent evidence implicates the involvement of 
STATs (signal transducer and activator of transcription) in VEGFR 
signaling (22). Genomic analysis reveals two worm orthologues of 
STATs (Y51H4, Y43D4 unfinished and F58E6.1), making the 
VEGFR-STAT association an attractive area for further investiga- 
tion. STATs contain an SH2 domain, a tyrosine phosphorylation 
domain, and a DNA-binding domain, and function in a unique 
JAK-STAT signaling pathway. Extensive studies in mammalian 
systems have established a model in which JAK kinases are 
constitutively bound to the cytoplasmic portion of cytokine recep- 
tors and are activated on receptor dimerization, facilitating recruit- 
ment of STATs to the receptor complex. Subsequent STAT phos- 
phorylation leads to their dimerization and translocation to the 
nucleus, where they function directly as transcription factors. Dro- 
sophila and Dictyostelium STATs both regulate cell division and 
pattern formation (23, 24). Dwsophila STAT has been genetically 
and biochemicaUy linked to a JAK-STAT signal transduction 
pathway that regulates pair-rule genes and hematopoiesis. Dictyo- 
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stelium STAT plays an essential role during the differentiation and 
aggregation of independent spore cells into stalk cells in response 
to the chemical signal referred to as differentiation-inducing factor. 
Furthermore, the Dictyostelium AX2 PTK has a second kinase-like 
domain found only in JAK-family kinases, suggesting the existence 
of a signaling network similar to that in flies and mammals. 
However, worms have no cytokines, no cytokine receptors, and no 
JAK-family kinases. Possibly, the JAK kinase function is replaced 
by a worm-specific PER kinase, or the STATs may have initially 
evolved to serve an alternative purpose. Mammalian STATs are 
also invoked in signaling through receptors for growth factors such 
as EGF, PDGF, VEGF, and angiopoietins. Because the EOF and 
VEGF signaling systems are present in worms, it is tempting to 
speculate that these represent the primordial raison d'etre for 
STATs, 

In general, related RTKs bind related ligands. In humans, there 
are at least 12 ligands, encoded by 10 genes, that have been shown 
to bind selectively to at least one of the four known EGFR-family 
members. Each of these ligands shares a conserved six-cysteine 
pattern in its receptor binding domain. In worms, LIN-3 has been 
shown to function as a LET-23 ligand. Although EGF motifs are 
prevalent in worms, we have identified three EGF-like proteins 
(F58G4.4, Y69H2.2, YG70G10A.2) that, in addition to the six 
cysteines, conserve many of the crucial receptor-binding residues 
and are juxtaposed next to a putative transmembrane domain, in a 
pattern similar to all known EGFR-family ligands. Worms also 
contain at least 3 FGF-like ligands, 12 insulin-like ligands (many 
more on inclusion of relaxin-related ligands), 2 distant homologues 
of VEGF, and 4 ephrin-related ligands, some of which would be 
predicted to bind to their cognate receptors. 

Orthologues of other RTK ligands prove more difficult to identify 
empirically. We see no evidence for a bona fide PDGF or NGF, and 
searches for ligands for MET, TIE, and AXI^family RTKs are con- 
founded by their similarity to plasminogen, fibrinogen, and fibrillin, 
respeaively. Furthermore, except for weak homologues of MET, these 
three RTK families are absent from worms. Nevertheless, the signifi- 
cance of a putative Ang2-like protein (Y43C5A2) in the absence of a 
TIE-family RTK remains to be determined. 

Protein-Tyrosine Kinase Group: CTKs. Most of the 52 CTKs in worms 
belong to the single SH2-containing PER family. Of the remaining 
10 CTKs, there are 2 orthologues of the SH3-containing ACK, and 
1 each of FYN (SRC family CTK), FRK, CSK, ABL, and FAK, plus 
3 unclassified CTKs, In vertebrates, CSK negatively regulates 
FYN-family kinases by phosphorylation of a C-terminal tyrosine 
facilitating a conformational change through an intramolecular 
SH2-P.Tyr interaction (25). We predict a similar functional inter- 
action between worm FYN and CSK Co-evolution of this regula- 
tory pair suggests even early metazoans required a means to 
dampen signaling through CTKs. Notably absent in worms are 
protein kinases related to the ZAP70 and JAK CTKs, whose 
primary role in mammals is in signaling through the T cell and 
cytokine receptors, both of which represent more specialized 
pathways not present in worms. Humans have eight SRC-family 
kinases whereas worms have only one. This redundancy has con- 
founded efforts to dissect out the precise role of these CTKs in 
human biology, often requiring "triple knockouts" to demonstrate 
a deficiency. The simplicity of non-FER-Iike CTKs in worms may 
be helpful in placing these CTKs within specific signaling cascades. 

Protein-Tyrosine Kinases: Adaptor and Docking Molecules. Ligand 
activation of RTKs results in tyrosine phosphorylation of both the 
receptor itself (autophosphorylation) and of downstream sub- 
strates. These phosphorylated tyrosines then function as attach- 
ment sites for proteins containing SH2 and other P.Tyr-binding 
domains. We have identified 74 proteins containing a total of 77 
SH2 domains in worms. The majority of these SH2 domains are in 
CTKs, two are present in a SHP2-related PTP, and the remainder 



are predicted to represent orthologues of a variety of adaptor 
molecules, including phospholipase C7, CBL, CIS4/SOCS5, CRK, 
NCK, SEM-5/GRB2, SHC, tensin, STAT, and VAV. Worms also 
contain at least 16 PTB domains, which in some cases have been 
found to interact specifically with tyrosine phosphorylated proteins. 
Worm PTB-containing proteins include orthologues of SHC, which 
also contains an SH2 domain, neuronal transmembrane protein 
Xll, and an insulin receptor substrate (IRS) family member. The 
mammalian Xll PTB domain does not to bind to PTyr, so we 
anticipate only a few of these worm domains will function as 
P.Tyr-binding domains. Additional potential phosphoprotein- 
binding domains identified in worms include three 14-3-3 domains, 
22 WW domains, and 11 FHA domains. 

IRS-1 and IRS-2 are major substrates of die insulin receptor RTK in 
mammals, and disruption of IRS-2 in mice leads to metabolic defects 
similar to diabetes. Worms have multiple insulin-like peptides, a recepn 
tor, and an IRS orthologue, demonstrating the early origins of meta- 
bolic regulation in multicellular organisms. The presence of such a 
diverse array of adaptor molecules underscores the utility of worms as 
a model for understanding mammalian signal transduaion. 

Other Protein Kinases, ^proximately 15% of the worm protein 
kinases do not fall into one of the six major groups but include smaller 
families with representatives in higher eukaryotes, including CHKl, 
DYRK, MLK, TAKl, PIM, RAF, STKR, and the mitotic kinases 
(BUBl, AURORA, PLK, and NIMA/NEK). Recent genetic and 
biochemical data place TAKl (transforming growth factor p-assodated 
kinase) on a MAPK-like pathway at the level of a MAPKKK acting 
upstream of tiie MAPK-family member NLK The worm orthologues 
of TAKl and NLK regulate Wnt-mediated cell polarization during 
embryogenesis (21). Biochemical data also demonstrate that this 
MAPK-like patiiway negatively regulates Wnt signaling because NLK 
phosphor>dates die TCF/LEF HMG transcription factors, thereby 
inhibiting Wnt-regulated binding of the /3-catenin-TCF complex to 
DNA. Both of these patiiways are conserved between mammals and 
worms. The likely orthologous human/worm pairs on the TAKl 
MAPK-like pathway include TAKl/MOM^, NLK/LTT-l, and 
TCF4/POP-L Upstream regulators may include TGFpi/DBI^l, 
TGFjS type I ieceptor/SMA-6, TGF)3 type 11 receptor/DAF-4 (worms 
have three receptor serine kinases). Additional components of the 
Wnt-signaling pathway, such as cadherin, tiie adenomatous polyposis 
coli tumor suppressor gene (APC), disheveled, and GSK-3 kinase are 
also present in worms, suggesting that tiiere may be a primordial 
connection between polarized control of cell division/migration and 
cellular transformation in vertebrates (26). 

Microbial-Like Kinases: Origin of Protein Kinases? The availability of 
the sequence of the first complete metazoan genome, combined 
with the sequence of budding yeast and several prokaryotic and 
Archaea genomes, provides an excellent opportunity to reassess 
current theories on the evolutionary origin of protein kinases. Pknl 
is a bacterial protein kinase-like sequence first described in the 
Gram-negative bacteria Myxococcus xanthus, which functions in 
growth and differentiation and in the ability of this prokaryote to 
form a fruiting body in response to nutrient starvation. Pkn-related 
proteins are present in other prokaryotes, including Streptomyces, 
Bacillus, Mycobacterium, Pseudomonas, Chlamydia, and Synecho- 
cystis, where they are involved in virulence, secondary metabolism, 
sporulation, and complex growth cycles (27). However, there are no 
Pkn homologues in bacteria with less complex life cycles, such as 
Escherichia coli, and Haemophilus influenzae, or in any Archaea, 
suggesting they may have been acquired by horizontal transmission 
from an early eukaryote, and are unlikely to represent the ancestral 
founders to protein kinases. 

In our kinase profile searches of the worm genome, we detected 
several entries with low profile scores, yet with significant (E 
value < 10"^) random expectation (E) values. Most of these 
contained similarity to kinase subdomains I, II, and VI, containing 
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the consensus GxGxxGxV, VAVK, and HxDxxxxN motifs, respec- 
tively. Upon further analysis, many of these entries could be 
classified into distinct families designated ABCl, RIOl, YGR262, 
diacylglycerol kinase, choline/ethanolamine kinases, and the 
YLKl-antibiotic resistance kinases. The first three families are 
named after their prototypic members in S. cerevisiae (27). 

Worms contain three proteins related to the budding yeast 
ABCl. The yeast protein is required for the assembly of the 
mitochondrial cytochrome c reductase complex, which functions as 
an electron carrier during oxidative phosphorylation to generate 
ATP (28). ABCl homologues are present in numerous pro- 
karyotes, including Mycobacterium, Clostridium, Rickettsia, Syn- 
echocystis, Azobacter, and Enterobacteriaceae such as E, coli and 
Providencia stuartii, in addition to the Archaea, Methanobacterium, 
ABCl-like proteins are also present in eukaryotes, including fission 
yeast, Arabidopsis, worms, and mammals. Although ABCl homo- 
logues are absent from bacteria such as Mycoplasma, Bacillus, 
Haemophilus, Helicobacter, and spirochetes, their presence in other 
prokaryotes, Archaea, and eukaryotes positions them as likely 
representatives of the primordial protein kinase, which was the 
progenitor of the eukaryotic protein kinase family. Based on their 
recognized role in mitochondrial ATP production and because they 
maintain many of the structurally important residues and motifs 
involved in ATP binding, the ABCl -family proteins may either bind 
ATP or act as phosphotransferases. Conceivably, the ABCl pro- 
teins transfer phosphate to proteins as part of a feedback loop to 
sense mitochondrial ATP levels. 

The RIOl family has three representatives in worms and is named 
after one of the two homologues in budding yeast. There are also 
representatives from several Archaea species, but none from bac- 
teria, making them a less attractive candidate as a progenitor to the 
protein kinase lineage. 

Atypical Protein Kinases and Protein Kinase-Like Domains. Worms 
contain 26 kinase-like domains present in receptor guanylyl cyclases 
(there are 10 additional soluble guanylyl cyclases), and at least 7 
diacylglycerol kinases, 7 choline/ethanolamine kinases, and 30 
YLKl-related antibiotic resistance kinases. Each of these families 
contain short motifs that were recognized by our profile searches 
with low scoring E-values, but a priori would not be expected to 
function as protein kinases. Instead, the similarity could simply 
reflect the modular nature of protein evolution and the primal role 
of ATP binding in diverse phosphotransfer enzymes. However, two 
recent papers on a bacterial homologue of the YLKl family 
suggests that the aminoglycoside phosphotransferases (APHs) are 
structurally and functionally related to protein kinases (28, 29). 
There are over 40 APHs identified from bacteria that are resistant 
to aminoglycosides such as kanamycin, gentamycin, or amikacin. 
The crystal structure of one well characterized APH reveals that it 
shares >40% structural identity with the two-lobed structure of the 
catalytic domain of cAMP-dependent protein kinase (PKA), in- 
cluding an N-terminal lobe composed of a five-stranded antiparallel 
)3 sheet and the core of the C-terminal lobe, including several 
invariant segments found in all protein kinases (29). APHs lack the 
GxGxxG normally present in the loop between p strands 1 and 2 but 
contain 7 of the 12 strictly conserved residues present in most 
protein kinases, including the HGDxxxN signature sequence in 
kinase subdomain VIB (29). Furthermore, Daigle et al (30) have 
demonstrated that this APH also exhibits protein-serine/threonine 
kinase activity, suggesting that the worm YLKl-related molecules 
may indeed be functional protein kinases. 

The eukaryotic lipid kinases (PI3Ks, PI4Ks, and PIPKs) also 
contain several short motifs similar to protein kinases but otherwise 
share minimal primary sequence similarity. However, once again, 
structural analysis of PIPKII/3 defines a conserved ATP-binding 
core that is strikingly similar to conventional protein kinases (31). 
Three residues are conserved among all of these enzymes, including 
(relative to the PKA sequence) Lys-72, which binds the a-phos- 



phate of ATP, Asp-166, which is part of the HRDLK motif, and 
Asp-184, from the conserved Mg^-^ or Mn^^ binding DFG motif 
(31). The worm genome contains 12 phosphatidylinositol kinases, 
including 3 PI3-kinases, 2 PI4-kinases, 3 PIP5-kinases, and 4 
PB-kinase-related kinases. The latter group has four mammalian 
members (DNA-PK, FRAP/TOR, ATM, and ATR), which have 
been shown to participate in the maintenance of genomic integrity 
in response to DNA damage and exhibit true protein kinase activity, 
raising the possibility that other Pl-kinases may also act as protein 
kinases. Regardless of whether they have true protein kinase 
activity, PI3-kinases are tightly linked to protein kinase signaling, as 
evidenced by their involvement downstream of many growth factor 
receptors and as upstream activators of the cell survival response 
mediated by the AKT protein kinase. 

There are several proteins with protein kinase activity that appear 
structurally unrelated to the eukaryotic protein kinases. These include 
Dictyostelkim myosin heavy chain kinase A, Physarum pofycephalum 
actin-fragmin kinase, the human A6 PTK, human BCR, mitochondrial 
pyruvate dehydrogenase and branched chain fatty acid dehydrogenase 
kinase, and the prokaryotic "histidine" protein kinase family. Worms 
lack representatives of the actin-fragmin kinases, BCR, and bacterial 
histidine kinases yet do contain a single representative of the other 
classes of atypical kinases and two homologues of the A6-related PTKs. 
The single worm orthologue of the DictyosteUum myosin heavy chain 
kinase A (32) proves to be the worm eukaryotic elongation factor 2 
kinase (33). The slime mold, worm, and human eukaryotic elongation 
factor 2 kinase homologues have all been demonstrated to have protein 
kinase activity, yet they bear little resemblance to conventional protein 
kinases except for the presence of a putative GxGxxG ATP-binding 
motif (33). 

The soK:alled histidine kinases are abundant in prokaryotes, with 
>20 representatives in E. coli, and have also been identified in yeast, 
molds, and plants. In response to external stimuli, these kinases act as 
part of two-component systems to regulate DNA replication, cell 
division, and differentiation through phosphorylation of an aspartate in 
the target protein (34). To date, no "histidine" kinases have been 
identified in metazoans, although mitochondrial pyruvate dehydroge- 
nase (PDK) and branched chain a-ketoacid dehydrogenase kinase are 
related in sequence, PDK and branched chain a-ketoacid dehydroge- 
nase kinase represent a unique family of atypical protein kinases 
involved in regulation of glycolysis, the citric add cycle, and protein 
synthesis during protein malnutrition. Structurally, they conserve only 
the C-terminal portion of "histidine" kinases, including the G box 
regions. Branched chain a-ketoacid dehydrogenase kinase phosphor- 
>dates the Ela subunit of the branched chain a-ketoacid dehydrogenase 
complex on Ser-293, proving it to be a ftinctional protein kinase (35). 
Although no bona fide "histidine" kinase has yet been identified in 
worms or humans, they do contain PDK homologues (one in worms 
and five in humans). However, the paucity of PDKs in worms makes it 
unlikely that they fill in for the absence of "histidine" kinases in 
metazoans. Instead, these signaling cascades have more likely been 
replaced by pathways initiated through RTKs. 

Based on these examples of atypical protein kinases present in the 
worm genome, we predict additional worm protein kinases will be 
functionally identified that lack any of the obvious motifs conserved 
in the conventional members. Indeed, various biochemical data 
point to the existence of true histidine, lysine, and arginine kinases 
in metazoans, yet their structural identity remains a mystery. 

Protein Phosphatases. Because of their important role in signal 
transduction, it is not surprising that the activity of protein kinases 
must be tightly regulated. This is accomplished through autoinhi- 
bition, autophosphorylation, transphosphorylation, dimerization, 
and cellular localization. Equally important is the role of protein 
phosphatases, which act to remove these regulatory phosphates 
from the protein kinase and its substrates. Because our analysis 
reveals worms to have a mature P.Tyr-signaling network, especially 
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when compared with the yeast genome, we surveyed the worm 
genome for protein-tyrosine phosphatases. 

Our analysis reveals 83 conventional protein-tyrosine phospha- 
tases (PTPs) plus 6 catalytic fragments and 12 additional fragments 
with high homology to the noncatalytic portion of other worm 
PTPs. In addition, there are 26 proteins classified as dual-specificity 
phosphatases related to MAPK phosphatases, CDC14, PRL, PIRl, 
CDC25, myotubularins, or PTEN lipid phosphatases. We also 
identify two SBFl- and one STYX-related proteins that are related 
to myotubularins and MAPK phosphatases yet lack the catalytic 
cysteine motif. These proteins are predicted to be catalytically inert 
yet may function as phosphoprotein-binding domains or anti- 
phosphatases (36). We also identify 11 inositol polyphosphate 
phosphatases and 65 serine-threonine phosphatases. Among the 83 
PTPs, there are 57 cytoplasmic PTPs and 26 receptor-like PTPs, 
most of which are worm specific, lacking clear human orthologues. 
Exceptions include worm orthologues of the cytoplasmic PTPs; 
SHP2, MEGl, and MEG2, and the receptor PTPs; and PTPS, 
PTPy, PTP/x, PTPp and IA2 (catalytically inactive). Overall, 
worms contain approximately the same number of tyrosine and 
dual-specificity protein kinases as they do tyrosine and dual- 
specificity protein phosphatases. This coordinate expansion in the 
eukaryotic lineage of both protein-tyrosine kinases and phospha- 
tases emphasizes the biological need to maintain tight regulation of 
tyrosine phosphorylation. Because of the large numbers of worm- 
specific PTKs (PER and KIN-15 families) and worm-specific PTPs 
(89%, or 66 of 74), it is tempting to speculate that these unique 
enzymes may regulate each other's activity, or function in the same 
signaling pathways. Precedence for such specificity comes from 
genetic data indicating that the CLR-1 receptor PTP attenuates 
EGL-15, an FGFR orthologue, signaling in worms (37). 

Conclusions. What does the worm genome sequence tell us about 
mammalian signal transduction? First, it has provided an ideal 
model to highlight the bioinformatics challenges that lie ahead with 
the human genome effort and allows us to test our analysis tools and 
database organization. Second, it lets us refine our expectations as 
to the diversity and absolute number of unique protein kinases that 
we can expect to find in the human genome. Based on our count of 
493 (411 conventional and 82 PK-like proteins) worm kinases, 
minus the « 197 kinases that appear to be worm-specific expansions 
of certain families such as the CKl, FER, and KIN-15 families, 
multiplied by the '**4-fold greater number of genes in humans 
compared with worms, we predict the human genome to contain 
1,100 protein kinases (PTKs and serine/threonine kinases). A 
similar extrapolation predicts ^300 human protein phosphatases 
(PTPs, dual-specificity phosphatases, and serine-threonine phos- 
phatases). Because our current count of human protein kinases and 
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phosphatases stands at '*^600 and 130, respectively, we still have 
about half the work ahead of us. However, recent claims predict the 
human genome may contain as many as 140,000 genes, compared 
with previous estimates of '^SOjOOO. Such calculations would result 
in a significant increase in our predictions of the total number of 
human protein kinases and phosphatases. 

We may expect to see less evolutionary expansion of protein kinases 
femilies that serve elemental cellular functions such as cell cycle control 
and chromosome segregation, compared with processes involved in 
intercellular signaling or organogenesis. However, there is already 
evidence for at least a 2-foId expansion in the number of CDKs and 
"mitotic kinases" from worms to humans. Unlike expressed sequence 
tag data mining and PCR-based gene discovery approaches, genomic 
strategies do not bias against genes whose expression is tightly regulated 
in a cell-, developmental-, or disease-specific manner. This point is 
highlighted by the identification of 650 seven-transmembrane chemo- 
receptors in the worm genome (1), many of which may be expressed 
exclusively in single neurons. Because worms have only ««302 neurons, 
compared with one trillion in humans, it would not be surprising to see 
this selectivity in cellular expression corroborated on mining the human 
genome. Indeed, because many of these novel protein kinases are likely 
to exhibit highly restricted expression, they may prove to be excellent 
targets for drug discovery in the battle against human disease. 

The worm serves as a much simpler and tractable organism than 
humans for deciphering signaling cascades. Although their P.Tyr- 
signaling system is quite mature — based on the content of protein- 
tyrosine kinases, phosphatases, and adaptor molecules — they lack 
much of the molecular redundancy that exists in mammals, allowing 
the geneticist, biochemist, and cell biologist to more readily gen- 
erate an "outline" of the signaling pathways that are conserved 
between worms and humans. The availability of the complete worm 
genome provides a unique opportunity to learn about human 
biology. Predicted orthologous pairs of human and worm genes can 
be targeted by using reverse genetic approaches to identify new 
signaling partners or biologic functions that can then be biochem- 
ically and functionally verified in mammals. 

Although worms and humans have much in conmion, they also have 
obvious differences. Worms do not have limbs or bones, or a circulatory 
or immune system, and they eat only bacteria. Not surprisingly, they lack 
several protein kinases present in humans. Over the next 2 years, we 
should be better able to define which protein kinases are required for 
these specialized fimctions as the genome sequences oiDrosopMa and 
humans are completed. Identification and classification of the proteins 
present in each is just a first step toward understanding the biological 
complexity of life. 
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The reversible phosphorylation of proteins on serine, thre- 
onine, and tyrosine residues represents a fundamental 
strategy used by eukaryotic organisms to regulate a host of 
biological functions, including DNA replication, cell cycle 
progression, energy metabolism, and cell growth and dif- 
ferentiation. Levels of cellular protein phosphorylation 
are modulated both by protein kinases and phosphatases. 
Although the importance of kinases in this process has 
long been recognized, an appreciation for the complex and 
fundamental role of phosphatases is more recent. Through 
extensive biochemical and genetic analysis, we now know 
that pathways are not simply switched on with kinases and 
off with phosphatases. Rather, it is the balance of phos- 
phorylation that is often critical Protein phosphorylation 
can regulate enzyme function, mediate protein-protein in- 
teractions, alter subcellular localization, and control pro- 
tein stability. Furthermore, kinases and phosphatases may 
work together to modulate the strength of a signal. Adding 
further complexity to this picture is the fact that both ki- 
nases and phosphatases can function in signaling networks 
where multiple kinases and phosphatases contribute to the 
outcome of a pathway. To fully understand this complex 
and essential regulatory process, the kinases and phos- 
phatases mediating the changes in cellular phosphoryla- 
tion must be identified and characterized. 

A variety of approaches, including biochemical purifica- 
tion, gene isolation by homology, and genetic screens, 
have been successfully used for the identification of puta- 
tive protein kinases and phosphatases. Now, the genomic 
sequencing of organisms promises to be a major contribu- 
tor to this field. Valuable insight into these important en- 
zymes has already emerged from the analysis of the yeast 
and worm genomes. In particular, genomic sequencing of 
Saccharomyces cerevisiae and Caenorhabditis eiegans has 
revealed the kinase and phosphatase gene families that 
have arisen during the evolution of multicellular eukary- 
otes (Plowman et al., 1999). With the recent determination 
of the Drosophila sequence, we can now survey the ge- 
nome of a second multicellular eukaryote for its repertoire 
of kinases and phosphatases. In this review, we will 
present our findings on the protein kinase and phos- 
phatase gene families identified in the fly, together with an 
examination of the kinase/phosphatase signaling pathways 
functioning in flies, worms, and humans. 
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Identification and Classification of Drosophila 
Protein Kinases and Piiosptiatases 

Our survey of Drosophila protein kinases and phospha- 
tases is based on the total set of predicted proteins that 
were identified in the Drosophila genome using auto- 
mated gene predictor methods (Adams et al., 2000; avail- 
able at http://www.celera.com). The 13,601 predicted fly 
proteins were surveyed for overall homology with known 
kinase and phosphatase sequences using BLASTP, and for 
the presence of polypeptide motifs using BLOCKS and In- 
terPro databases (Rubin et al., 2000). Putative kinases and 
phosphatases identified by these means were further clas- 
sified based on the presence of diagnostic amino acid resi- 
dues in conserved motifs and by sequence similarities 
extending beyond conserved catalytic domains. Table I 
summarizes our survey of the Drosophila protein kinases 
and phosphatases. It is important to realize that this analy- 
sis represents the first tabulation of these enzymes in 
Drosophila and will be subject to revision as gaps in the 
genomic sequence are closed and methods for predicting 
and analyzing genes are improved. In particular, it is 
known that the Genie and Genscan programs used to an- 
notate the fly genomic sequence make systematic errors 
with respect to intron-exon boundaries and gene borders, 
leading us to conclude that some kinase and phosphatase 
proteins may have been missed by these programs (Reese 
et al., 2000), These caveats notwithstanding, 251 kinases 
and 86 phosphatases were identified by our analysis of the 
predicted Drosophila protein set. Remarkably, more than 
half of these molecules had gone undetected in eight de- 
cades of Drosophila research. 

Protein Kinases 

Eukaryotic protein kinases are enzymes that catalyze the 
transfer of phosphate from ATP or GTP onto serine, thre- 
onine, or tyrosine residues of their appropriate substrates. 
They comprise a single protein superfamily having a com- 
mon catalytic structure. However, these enzymes can be 
subdivided into distinct groups based on their structural 
and functional properties (Hanks and Hunter, 1995). 

AGC Family 

The AGC serine/threonine kinases function in many intra- 
cellular signaling pathways and were first classified based 
on their tendency to phosphorylate sites surrounded by 
basic amino acids. Drosophila contains '^30 AGC kinases, 
including members of the cyclic nucleotide-dependent ki- 
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Table 1. Summary of Protein Kinases and Phosphatases in 
Flies, Worms, and Humans 

Group Fly Worm* Humans* 



Protein kinase 



AGC 


30 (8) 


30 


100 


CaMK 


25(13) 


32 


83 


CKI 


8(6) 


87 


5 


CMGC 


24(7) 


42 


62 


STE 


21 (12) 


28 


63 


PTK 


32 (8) 


92 


100 


OPK 


56 (28) 


62 


163 


Atypical 


3(2) 


4 


11 


Fragment/unknown 


18 






Protein kinase like 








Gcyc 


11(6) 


26 


8 


PIK 


13(8) 


12 


20 


DAG 


8(5) 


7 


8 


Choline K 


2(1) 


7 


2 


Phosphatase 








STP 


28 (14) 


65 


21 


RPTP, CPTP, LMW-PTP 


20(12) 


83 


47 


DSP 


18(11) 


26 


51 


IPP 


20(18) 


11 


7 



Fly numbers in parentheses represent the proteins newly identified by the fly genome 
project. 

♦These numbers are taken from the review by Plowman et al. (1999). 



nases, protein kinase C (PKC),^ AKT, NDR, MNK, 
MAST, ribosomal S6 kinase, and G protein-coupled re- 
ceptor kinase families. The majority of the fly AGC ki- 
nases had been identified previously by molecular and ge- 
netic analysis; however, eight members were uncovered in 
the fly genome project. Interestingly, four of the new 
genes encode PKC or PKC-related proteins, including the 
first atypical PKC isoforms identified in Drosophila. Also 
identified by the fly genome project were additional PKA 
and PKG proteins, as well as kinases related to mamma- 
lian MAST205 and Citron. 

CaMK Family 

The CaMK serine/threonine kinases also tend to have sub- 
strate recognition motifs containing basic amino acids, 
and some but not all members of this family are regulated 
by calcium or calmodulin. Approximately 25 CaMKs are 
present in Drosophila, including representatives of the cal- 
cium/calmodulin-regulated kinase, SNFl/AMP-dependent 
kinase, EMK, CHK2, myosin light chain kinase (MLCK). 
phosphorylase kinase, death-associated protein kinase, 
and MAPKAP kinase families (the last four of which 
are found in C. elegans but not yeast) . Like worms, flies 
do not encode a complete ortholog of the mammalian 
Trio kinase, but do have a protein that is related to the en- 
tire Trio regulatory domain. CaMK members revealed by 
the fly genome project include proteins related to calcium/ 
calmodulin-regulated kinases, MLCK, EMK, and mam- 
malian DRAKL Of the 13 newly identified CaMKs. 6 be- 



^ Abbreviations used in this paper: CDK, cycl in-dependent kinase; CKI. 
casein kinase I; CTK. cytoplasmic tyrosine kinase: DSP, dual specificity 
phosphatase; LMW, low molecular weight: MKP. MAPK phosphatase; 
PKC, protein kinase C: PTP, protein tyrosine phosphatase; RTK, receptor 
tyrosine kinase; STP, serine/threonine protein phosphatase. 



long to the EMK family, making this the largest CaMK 
group in flies. Mammalian and C. elegans EMK proteins 
have been implicated in the regulation of cell polarity and 
microtubule stability (Drewes et al., 1998). 

Casein Kinase I Family 

The casein kinase I (CKI) proteins originally were charac- 
terized as ubiquitous serine/threonine kinases with a pref- 
erence for acidic substrates such as casein. Although mem- 
bers of this family were among the first kinases purified, 
elucidating their function and regulation has been difficult. 
Recently, however, CKI isoforms have been found to play 
a role in DNA repair and cell division (Gross and Ander- 
son, 1998), in the Wnt signaling pathway (Peters et al., 

1999) , and in circadian rhythm regulation (Lowrey et al., 

2000) , Drosophila contains at least eight CKI proteins, 
only two of which were known previously. Intriguingly, 
CKI is one of the kinase families that is significantly ex- 
panded in the worm, with 87 members identified in C ele- 
gans (Plowman et al, 1999). The biological significance of 
the worm-specific expansion is currently unknown. 

CMGC Family 

CMGC family members are primarily proline-directed 
serine/threonine kinases. The major subfamilies of this 
group play key roles in cell cycle regulation and intracellu- 
lar signal transduction, and, not surprisingly, are con- 
served from yeast to humans. Approximately 24 CMGC 
kinases are found in Drosophila, including members of the 
cyclin-dependent kinase (CDK), CDC-like kinase (CLK), 
glycogen synthase kinase 3 (GSK3), and MAPK families. 
Although extensive genetic analysis had revealed many of 
the Drosophila CMGC kinases, seven novel proteins were 
uncovered by the fly genome project. These include addi- 
tional CDK (CDK7-like, CDC2-related KKIALRE, CHED- 
related), GSK3, and MAPK (ERK7) members, as well as 
an RCK family member (MAK). Also uncovered in the fly 
genome were proteins related to the MPl and JIP-1 scaf- 
folding proteins. These molecules function to localize 
MAPK proteins with their upstream activators and pro- 
vide signaling specificity (Whitmarsh and Davis, 1998). Al- 
though MAPK scafl'olding proteins are present in yeast, 
they are structurally different from the ones found in flies, 
worms, and mammals, perhaps indicating the evolution of 
these molecules in multicellular eukaryotes. 

STE Family 

The STE family is composed of the STE7 (MEK). STE 11 
(MEKK), and STE20 (MEKKK) kinases that function up- 
stream of MAPK proteins. Drosophila contains '^21 mem- 
bers of this family, only 9 of which were known previously. 
Remarkably, 9 members of the PAK/STE20 group were 
uncovered by the fly genome project, including proteins 
related to mammalian PAK3, GLKl, NIK, MST2, STLK3, 
TAOl, and CDC7. Although PAK proteins containing PH 
domains are found in yeast (Sells et al., 1999), no PH- 
domain-containing PAKs have been identifled in higher eu- 
karyotes and none are present in Drosophila, MEKK- and 
NEK-related kinases were also revealed by the genome 
project. It is worth noting that even with the discovery of 
additional MEK and MAPK proteins in the fly, C. elegans 
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contains over twice as many of these kinases, suggesting an 
expansion of MAPK signaling modules in the worm. 

PTKFamUy 

The PTK group consists of receptor (RTK) and cytoplas- 
mic (CTK) tyrosine kinases. Although yeasts contain no 
conventional PTKs, 92 have been identified in the worm 
and ~32 are present in the fly. A major function of PTKs 
is in intercellular communication, perhaps explaining why 
these enzymes have only been identified in multicellular 
eukaryotes. In comparison to Drosophila, the much larger 
number of PTKs found in C. elegans is due primarily to ex- 
pansions of the worm-specific Kin- 15/16 RTK and PER 
CTK families. The majority of the fly PTKs had been iden- 
tified previously by genetic approaches, reflecting the in- 
volvement of these proteins in critical growth and devel- 
opmental pathways. RTKs encoded in the fly genome 
include the fly-specific Torso and Sevenless kinases, as 
well as kinases related in sequence if not function to the 
mammalian EGFR, FGFR, insulin receptor, EPH, RET. 
ROR. RYK, ALK, and TRK kinases. Of the five newly 
identified RTKs, two are related to mammalian PDGFR/ 
VEGFR, two are DDR receptors, and one shares homol- 
ogy with FGFRl. In the CTK group, fly members include 
the JAK, FAK, SYK/SHARK, ACK, ABL, and FPS ki- 
nases. Of the newly identified CTKs, one is related to 
mammalian ACK2 and one is an ortholog of CSK, a ki- 
nase that negatively regulates the activity of mammalian 
SRC kinases. Interestingly, several members of the PTK 
class are not found in worms, including representatives of 
the SYK, JAK, TRK, and RET families. 

OPK Group 

This group is comprised of other protein kinase (OPK) 
families that do not belong to the sbc major groups de- 
scribed above. It is the largest class of kinases found in 
flies and consists of both serine/threonine and dual speci- 
ficity kinases. Approximately 56 of these enzymes are 
present in the fly genome, only half of which were known 
previously. Representatives of this group are extremely di- 
verse and include members of the following families: Au- 
rora, BUBl, CHKl, DYRK. WEE-1, PLK, EIF2, TGFp, 
and activin receptor. TAK, IKK kinases, CKII, and RAF 
kinase. Notable in the novel group are additional BUBl 
and TAK members and enzymes related to C. elegans 
UNC 51 and mammalian ALK3, DLK, GAK, MLK2. 
SRPK, IRE, ILK. TLKl, LIM-domain kinase, and LKBl/ 
Peutz-Jeghers kinase. 

Atypical Lipid, and Unknown Kinases 

Several protein groups that are structurally related to the 
eukaryotic protein kinases are also found in the Drosoph- 
ila genome. These include the atypical kinases, guanylyl 
cyclases, and the eukaryotic lipid kinases. Flies contains at 
least three atypical kinase members, pyruvate dehydroge- 
nase kinase. AS, and a newly identified BCR protein. Al- 
though worms lack BCR. they do contain a protein related 
to the atypical Dictyostelium myosin heavy chain kinase, 
which appears to be missing in flies. Also absent in both 
Drosophila dind C. elegans ^vq representatives of the classi- 
cal prokaryotic histidine kinases. In the lipid kinase group, 



Drosophila encodes at least 8 diacylglycerol kinases, 2 
choline/ethanolamine kinases, and 13 phophatidylinositol 
kinases (PI3-, PI4-, PIP5.- and PIP3-related kinases), the 
majority of which were unknown previously. In mamma- 
lian cells, members of the PIP3-related kinase family par- 
ticipate in the cellular response to DNA damage and have 
authentic protein kinase activity (for review see Fruman 
et al., 1998). The fly genome project has revealed three ki- 
nases of this group, namely ATM, FRAP-related protein 
(FRP), and FRAP/TOR; however, as is true for worms, 
flies do not contain a DNA-PK. Finally. '^18 proteins were 
identified that represent either partial kinase fragments or 
kinases with no significant homology to the groups listed 
above. Since errors have been identified in the transcript 
annotation of several protein kinases, such as the DDR re- 
ceptors. Citron, and a PKC isoform, some of the partial ki- 
nase sequences may represent intact enzymes that have 
been improperly annotated. Further analysis will be re- 
quired to confirm their identity. 

Protein Phosphatases 

Unlike protein kinases, which share a common catalytic 
structure, protein phosphatases have different basic struc- 
tures, use distinct catalytic mechanisms, and comprise at 
least three separate protein families. Phosphatases are typ- 
ically classified into two main groups, the serine/threonine 
protein phosphatases (STPs) and protein tyrosine phos- 
phatases (PTPs). 

STPs 

STPs can be subdivided into the PPP and PPM families 
based on distinct amino acid sequences and crystal struc- 
tures (for review see Cohen. 1997). Both families are 
widely distributed across phyla with representatives found 
in yeast, flies, worms, and mammals. Before the Drosoph- 
ila sequencing project, almost all known fly STPs had been 
identified by molecular cloning approaches. Very few 
STPs have been isolated by genetic analysis, indicating 
that shared substrate specificity and/or functional redun- 
dancy may have prevented the recovery of such mutants. 
Drosophila contains ^28 STPs, whereas >65 are encoded 
in the C. elegans genome. The increased number of worm 
STPs appears to be due to an expansion of the PPP family. 
Members of the PPP family, such as PPl, PP2A. and 
PP2B, have been implicated in numerous biological pro- 
cesses and signal transduction pathways. The diverse func- 
tions of this family are accomplished by a relatively small 
number of highly conserved catalytic subunits that com- 
plex with a wide variety of regulatory proteins, thus tar- 
geting the enzyme to specific intracellular locations and 
substrates. The Drosophila genome encodes ^17 PPP cat- 
alytic proteins, 8 PPl -related enzymes (including PPls, 
PPN, and PPY). 4 PP2A members (including PP2A, PP4, 
and PPV), 3 PP2B-like molecules, and 2 PP5 proteins. Ad- 
ditional PPP catalytic subunits uncovered by the fly ge- 
nome project include members of the PPl, PP4, and PP2B 
groups. In regard to PPP regulatory subunits, Drosophila 
contains at least 3 PPl. 5 PP2A, and 2 PP2B proteins. 
However, because the regulatory subunits are so diverse, 
these numbers are likely to be low. 
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The PPM family includes PP2C and mitochondrial pyru- 
vate dehydrogenase phosphatase. Due to their highly di- 
vergent primary sequences, few PPM members have been 
isolated by homology-based methods and none have been 
identified by genetic analysis. The only Drosophila PP2C 
protein that had been previously known was identified by 
genomic walking (Dick et al., 1997). Remarkably, the ge- 
nome project has uncovered at least 1 1 new PP2C-related 
sequences, including one that closely resembles pyruvate 
dehydrogenase phosphatase. The biological function of the 
PPM family has been difficult to assess in mammalian cells 
due to the lack of specific inhibitors that target these en- 
zymes. Recently, however, a PP2C protein has been found 
to dephosphorylate CDC2 on Thrl61 in yeast (Cheng et al., 
1999). Whether any of the PP2Cs perform a similar func- 
tion in Drosophila waits to be determined. 

PTPs 

PTPs are found in all eukaryotic organisms, and are de- 
fined by the catalytic signature motif Cys-X5-Arg (for re- 
view see Neel and Tonks, 1997). The PTP superfamily 
consists of classical PTPs (RPTP, CPTP), dual specificity 
phosphatases (DSPs), and low molecular weight (LMW) 
PTPs. Approximately 38 PTPs are encoded in the fly ge- 
nome, including representatives of each class. Again, 
many more PTPs are found in the worm (109 total). It is 
interesting to note that the expansion of serine/threonine 
and tyrosine kinase families in worms has been accompa- 
nied by a corresponding expansion of both serine/threo- 
nine and tyrosine phosphatases. 

Members of the classical PTP family contain a con- 
served catalytic domain that is often fused to a large non- 
catal3^ic region. The PTP noncatalytic domains are quite 
diverse and can function to regulate enzyme activity and/ 
or mediate protein interactions. Like PTKs, classical PTPs 
can be divided into two groups, receptor PTPs (RPTPs) 
and cytoplasmic PTPs (CPTPs). Genetic studies in Dro- 
sophila have been instrumental to our understanding of both 
groups. In particular, experiments in the fly were among 
the first to demonstrate the involvement of RPTPs in neu- 
ronal axon guidance (for review see Desai et al., 1997; den 
Hertog, 1999). Drosophila encodes '^S RPTKs, at least 5 
of which function in this capacity. Of the newly identified 
RPTPs, one is related to mammalian RPTP-k and two 
share homology with RPTP-X/1A2, a type 1 transmem- 
brane PTP implicated in nervous system development and 
insulin-mediated pancreatic function. In regard to the 
CPTP class, Drosophila studies on the CSW phosphatase 
were pivotal in demonstrating that a CPTP could function 
as a positive effector of cell signaling (Perkins et al., 1992). 
CSW is a member of the SH2-domain containing PTPs 
(SHP subclass) . Mammals are known to have at least two 
SHPs, whereas no additional SHP proteins were found in 
Drosophiia, indicating that flies, like worms, possess a sin- 
gle SHP molecule. Overall the fly genome encodes at least 
5 CPTPs, namely CSW, PTP-ER, and newly identified 
CPTPs related to the mammalian MEGl, MEG2, and 
PTPDl phosphatases. Finally, Drosophila contains four 
additional PTP-related proteins which are either difficult 
to classify or represent incomplete phosphatase fragments. 

DSPs are a diverse collection of phosphatase subgroups 



that share little sequence homology outside of the con- 
served Cys-X5-Arg motif with other DSP subgroups or 
with members of the larger PTP family. DSPs were origi- 
nally characterized by their ability to dephosphorylate both 
serine/threonine and tyrosine residues; however, some of 
the DSP subgroups, namely PTEN and myotubularin, also 
possess lipid phosphatase activity (Maehama and Dbcon, 
1999). Approximately 18 DSPs are found in Drosophila, 
including representatives of the MAPK phosphatase (MKP), 
PTEN, nuclear prenylated PRL, myotubularin, PIRl, 
CDC 14, and CDC25 phosphatase groups. Of the nine 
DSPs uncovered by the fly genome project, six belong to 
the MKP group, a remarkable finding considering the ex- 
traordinary effort spent studying MAPK pathways in 
Drosophila. Only Puckered, a negative regulator of the 
JNK pathway, previously had been identified by genetic 
techniques (Martin-Blanco et al., 1998). The failure of the 
new MKPs to be uncovered by genetic analysis may indi- 
cate that they participate in MAPK pathways controlling 
subtle or unappreciated phenotypes. Alternatively, their 
functions may have been obscured by redundancy within 
the MKP group or with other phosphatases. Additional o 
DSPs revealed by the genome project include enzymes re- | 
lated to CDC14 and myotubularin. Interestingly, flies also o 
contain three myotubularin -related sequences that lack o. 
the active site Cys and Arg residues. As has been sug- o. 
gested for similar mammalian myotubularin-related mole- 
cules, these proteins may function as antiphosphatases by 3 
binding to and protecting substrates from dephosphoryla- 
tion by myotubularin or related phosphatase (Hunter, 1 998; ^ 
for review see Laporte et al., 1998). -o 
LMW-PTPs are -^ISO-amino acid residue cytoplasmic 3 
enzymes that have been shown to possess tyrosine phos- 9* 
phatase activity (Ostanin et al., 1995). Other than a strictly J 
conserved Cys-X5-Arg catalytic motif, LMW-PTPs bear o 
littie resemblance to the other PTP members. Mammalian ^ 
LMW-PTPs have been implicated to function in EPH o 
(Stein et al., 1998) and PDGF receptor signaling (Chiarugi 
et al., 2000); however, much remains to be learned regard- 3 
ing the biological activity of these enzymes. Although two cd 
putative LMW-PTPs are revealed by the Drosophila ge- m 
nome project, both predicted proteins are larger than 
would be expected (424 and 250 amino acids, respec- o 
tively). The smaller protein contains a complete LMW- o 
PTP domain but lacks the conserved Arg residue in the 
catalytic motif Intriguingly, the larger protein has two 
complete LMW-PTP domains. Although the first domain 
has a mutation in the active site Cys residue and is likely to 
be inactive, the second domain contains an intact PTP cat- 
alytic motif and presumably has catalytic activity. If this 
protein is made in vivo, it would represent a new type of 
LMW-PTP having a tandem catalytic domain structure 
similar to that observed in many RPTPs. Whether this 
molecule is an authentic LMW-PTP and whether it has a 
human counterpart remains to be determined. 

Lipid Phosphatases 

Lipid inositol phosphatases play an important role in 
mediating the intracellular balance of second messenger 
phospholipids. Drosophila encodes approximately 20 ino- 
sitol phosphatases (IPP), only 2 of which were known pre- 
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viously. Six inositol- 1,4,5-triphosphase phosphatase-like 
enzymes are contained in the fly genome; yet as is true 
for worms, no ortholog of the mammalian SH2-domain- 
containing inositol 5' phosphatase (SHIP) appears to be 
present. Drosophila does encode eight PPAP enzymes, 
which dephosphorylate phosphatidic acid to generate di- 
acylglycerol. The prototype member of this class, Wunen, 
was first identified in a genetic screen for factors controlling 
germ cell migration in the early Drosophila embryo (Zhang 
et al., 1996). Related proteins were subsequently identified 
in yeast, worms, and mammals. Remarkably, the fly genome 
project reveals seven additional Wunen-like phosphatases. 
Also uncovered by the genome project are six members of 
the inositol monophosphate phosphatase (IMP) group. Both 
the Wunen-like and inositol monophosphate phosphatases 
are characterized by small tandem gene arrangements, sug- 
gesting a limited expansion of these phosphatase families in 
Drosophila. The large number of newly identified inositol 
phosphatases underscores the hitherto unappreciated im- 
portance of lipid phosphoregulation in the fly. 

Comparative Analysis of 
Phosphorylation-dependent Signaling Pathways 

With the completion of both the Drosophila and C. eie- 



gans genome projects, together with our current knowl- 
edge of mammalian signaling pathways, we can begin to 
draw conclusions regarding the regulatory complexity of 
protein phosphorylation mechanisms across the evolution- 
ary spectrum. For example, in flies, worms, and humans, 
there is a high degree of structural and functional conser- 
vation between the components of the RTK and stress- 
activated signaling pathways, with the major difference 
being the number of isoforms present for individual path- 
way members. In higher organisms, the number of isoforms 
is increased, presumably providing greater potential for 
tissue- or stage-specific functions, signaling cross-talk, and 
regulatory complexity (Fig. 1). Significantly, differences 
in phosphorylation-mediated signaling cascades between 
worms, flies and humans become apparent when examin- 
ing the pathways involved in hematopoiesis and immunity. 
The JAK/STAT cascade, which has been implicated in he- 
matopoiesis and cytokine signaling, is present in humans 
and flies. Worms, however, lack JAK kinases but do pos- 
sess STAT proteins that are regulated by tyrosine phos- 
phorylation. Like humans, flies also contain the Toll/IKK/ 
NFkB pathway, which plays a role in the immune response 
to microbial organisms. No evidence of an inducible host 
defense system has been demonstrated in worms, consis- 
tent with the lack of this pathway in C. elegans. Also miss- 




Human, Fly Human, Fly? 




Figure I. Comparison of the protein kinase/ 
phosphatase signaling pathways in flies, 
worms, and humans (see text for description). 
Kinases are depicted as black rectangles, 
phosphatases are gray triangles, and other sig- 
naling components are in white. Shapes in 
dotted lines indicate mammalian proteins 
with no clear fly homologue; however, the 
function of these components in the pathway 
may be provided by other Drosophila pro- 
teins with related biochemical activities. 
Drosophiia gene names are listed in paren- 
theses. 
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ing in the worm are the S YK/ZAP70 kinases which play an 
important role in human T and B cell signaling. Dro- 
sophila may possess some form of this pathway as indicated 
by the presence of the fly SHARK kinase. The Drosophila 
SHARK kinase is a member of the SYK/ZAP70 family; 
however, it is most closely related to the HTK16 kinase of 
Hydra based on the presence of ANK repeats which are 
not found in any of the known mammalian SYK/ZAP70 
family members (Chan et al., 1994; Ferrante et al. 1995). 
Exact homologues of proteins functioning with SYK/ 
ZAP70 in the mammalian hematopoietic cascade, includ- 
ing the SLP-76, LAT, and BLNK adaptor proteins, the 
LCK and LYN kinases, and the SHP-1 and SHIP phos- 
phatases were not revealed by the fly genome project; how- 
ever, Drosophila proteins with related biological activities 
are found, namely SHP-2, inositol- 1,4,5- triphosphate phos- 
phatase, and other SRC-kinase members. Thus, further 
studies are required to determine whether a rudimentary 
form of the SYK/Z AP70 pathway does function in flies. 

The completion of the Drosophila genome project also 
allows us to look globally at the pathways in which many 
of the newly identified fly enzymes may function. In par- 
ticular, many of the proteins revealed in the Drosophila 
genome are orthologs of kinases and phosphatases known 
to function in the Rac/Rho/CDC42 signaling pathway 
(Citron, ACK2, MLK2, MEKK4, LIM-domain kinase. 
PAK/STE20, and DSPs members), in cell cycle regula- 
tion (CDK7, BUBl, NEKl, NEK2, CDC14, CDC7. and 
PP2C), and in pathways establishing asymmetry and cell 
polarity (LKBl, SLKl, and EMK kinases). Whether these 
enzymes went undetected for so many years because of 
functional redundancy or unappreciated phenotypes has 
yet to be determined. 

In conclusion, ~251 protein kinases and 86 phos- 
phatases have been identified in the Drosophila genome. 
Although the overall number of fly enzymes is lower than 
that found C. elegans, the difference is largely due to the 
worm-specific expansion of certain gene families. Interest- 
ingly, no large expansions or deletions of particular kinase 
or phosphatase gene families were uncovered by the 
Drosophila genome project. All of the previously known 
Drosophila kinases and phosphatases were detected in our 
analysis, confirming the relative completeness of the ge- 
nome sequence data. Remarkably, almost 170 new protein 
kinases and phosphatases were identified by the fly ge- 
nome project (Table I). The next challenge for scientists 
will be to determine the role of these enzymes in Dro- 
sophila development and physiology. 

We are grateful to Celera Genomics and the Berkeley Drosophila Ge- 
nome Project for providing access to the Drosophila genome sequence 
and predicted protein dataset. We thank Greg Plowman for his advice and 
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Viewpoint 



ERK2 pathways, which contributes to the 
increased proliferative rate of tumor cells. For 
this reason, inhibitors of the ERX pathways 
are entering clinical trials as potential anti- 
cancer agents. In differentiated cells, ERKs 
have different roles and are involved in re- 
sponses such as learning and memory in the 
central nervous system. 

JNK1JNK2. and JNK3 

The JNKs were isolated and characterized as 
stress-activated protein kinases on the basis of 
their activation in response to inhibition of pro- 
tein synthesis (8). The JNKs were then discov- 
ered to bind and phosphorylate the DNA bind- 
ing protein c-Jun and increase its transcriptional 
activity. c-Jun is a component of the AP-1 
transcription complex, which is an important 
regulator of gene expression. AP-1 contributes 
to the control of many cytokine genes and is 
activated in response to environmental stress, 
radiation, and growth factors — all stimuli that 
activate JNKs. Regulation of the JNK pathway 
is extremely complex and is influenced by 
many MKKKs. As depicted in the STKE JNK 
Pathway Connections Map, there are 13 
MKKKs that regulate the JNKs. This diversity 
of MKKKs allows a wide range of stimuli to 
activate this MAPK pathway. JNKs are impor- 
tant in controlling programmed cell death or 
apoptosis (P). The inhibition of JNKs enhances 
chemotherapy-induced inhibition of tumor cell 
growth, suggesting that JNKs may provide a 
molecular target for the treatment of cancer. 



Ever since the discovery nearly 50 years ago 
that reversible phosphorylation regulates the ac- 
tivity of glycogen phosphorylase, there has 



JNK inhibitors have also shown promise in 
animal models for the treatment of rheumatoid 
arthritis (10). The pharmaceutical industry is 
bringing JNK inhibitors into clinical trials for 
both diseases. 

p38 Kinases 

There are four p38 kinases: a, p, -y, and 5. The 
p38a enzyme is the most well characterized 
and is expressed in most ceil types. The p38 
kinases were first defined in a screen for drugs 
inhibiting tumor necrosis factor a-mediated in- 
flanunatory responses (77). The p38 MAPKs 
regulate the expression of many cytokines. p38 
is activated in immune cells by inflammatory 
cytokines and has an important role in activa- 
tion of the immune response. p38 MAPKs are 
activated by many other stimuli, including hor- 
mones, ligands for G protein-coupled recep- 
tors, and stresses such as osmotic shock and 
heat shock. Because the p38 MAPKs are key 
regulators of inflammatory cytokine expres- 
sion, they appear to be involved in human 
diseases such as asthma and autoimmunity. 

Recently, a major paradigm shif^ for 
MAPK regulation was developed for p38a. 
The p38a enzyme is activated by the protein 
TABl (72), but TABl is not a MKK. Rather, 
TAB 1 appears to be an adaptor or scaffolding 
protein and has no known catalytic activity. 
This is the first demonstration that another 
mechanism exists for the regulation of 
MAPKs in addition to the MKKK-MKK- 
MAPK regulatory module. This important 



been intense interest in the role of protein phos- 
phorylation in regulating protein function. With 
the advent of DNA cloning and sequencing in 



observation indicates that other adaptor pro- 
teins should be scrutinized for potential roles 
in regulating MAPK activity. 

The importance of MAPKs in controlling 
cellular responses to the environment and in 
regulating gene expression, cell grovrth, and 
apoptosis has made them a priority for re- 
search related to many human diseases. The 
ERK, JNK, and p38 pathways are all molec- 
ular targets for drug development, and inhib- 
itors of MAPKs will imdoubtedly be one of 
the next group of drugs developed for the 
treatment of human disease (13), 
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the mid-1970s, it rapidly became clear that a 
large family of eukaryotic protein kinases ex- 
ists, and the burgeoning numbers of protein 
kinases led to the speculation that a vertebrate 
genome might encode as many as 1001 protein 
kinases (7). The near-completion of the human 
genome sequence now allows the identification 
of almost all human protein kinases. The total 
(518) is about half that predicted 15 years ago, 
but it is still a strikingly large number, consti- 
tuting about 1.7% of all human genes. 

Protein kinases mediate most of the signal 
transduction in eukaryotic cells; by modifica- 
tion of substrate activity, protein kinases also 
control many other cellular processes, includ- 
ing metabolism, transcription, cell cycle pro- 
gression, cytoskeletal rearrangement and cell 
movement, apoptosis, and differentiation. 
Protein phosphorylation also plays a critical 
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The Protein Kinase Complement 
of the Human Genome 

G. Manning,^* D. B. Whyte,^ R. Martinez,^ T. Hunter,^ 
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We have catalogued the protein kinase complement of the human genome (the 
"kinome") using public and proprietary genomic, complementary DNA, and 
expressed sequence tag (EST) sequences. This provides a starting point for 
comprehensive analysis of protein phosphorylation in normal and disease 
states, as well as a detailed view of the current state of human genome analysis 
through a focus on one large gene family. We identify 518 putative protein 
kinase genes, of which 71 have not previously been reported or described as 
kinases, and we extend or correct the protein sequences of 56 more kinases. 
New genes include members of well-studied families as well as previously 
unidentified families, some of which are conserved in model organisms. Clas- 
sification and comparison with model organism kinomes identified orthologous 
groups and highlighted expansions specific to human and other lineages. We 
abo identified 106 protein kinase pseudogenes. Chromosomal mapping re- 
vealed several small clusters of kinase genes and revealed that 244 kinases map 
to disease loci or cancer amplicons. 
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Table 1. Kinase distribution by major groups in human and model systems. A detailed classification is available in tables SI and S6. 
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role in intercellular communication during 
development, in physiological responses and 
in homeostasis, and in the functioning of the 
nervous and immune systems. Protein ki- 
nases are among the largest families of genes 
in eukaryotes (2-6) and have been intensive- 
ly studied. As such, they made an attractive 
target for an initial in-depth analysis of the 
gene distribution in the draft hiunan genome. 
Mutations and dysregulation of protein ki- 
nases piay causal roles in human disease, 
affording the possibility of developing ago- 
nists and antagonists of these enzymes for use 
in disease therapy (7-9), A complete catalog 
of human protein kinases will aid in the 
discovery of human disease genes and in the 
development of therapeutics. 

Comprehensive Discovery of Protein 
Kinase Cenes 

Most protein kinases belong to a single su- 
perfamily containing a eukaryotic protein ki- 
nase (ePK) catalytic domain. We set out to 
identify all sequenced human ePKs by 
searching every available human sequence 
source (public and Celera genomic databas- 
es, Incyte ESTs, in-house and GenBank 
cDNAs and ESTs) with a hidden Markov 
model (HMM) profile of the ePK domain 
(10). This profile is sensitive enough to detect 
short Augments of even very divergent ki- 
nases that have little similarity to any single 
known kinase. We extended these fragments 
to full-length gene predictions using a com- 
bination of EST and cDNA data, Gene wise 
homology modeling, and Genscan ab initio 
gene prediction; more than 90% of the new 
and extended sequences were verified by 
cDNA cloning. We also identified 13 atypical 
protein kinase (aPK) families. These contain 
proteins reported to have biochemical kinase 



activity, but which lack sequence similarity 
to the ePK domain, and their close ho- 
mologs (JO), Some aPKs have structural 
similarity to ePK domains (J J), New aPKs 
were identified with the use of additional 
HMMs and Psi-Blast. 

How Many Protein Kinases In the 
Genome? 

We identified 478 human ePKs and 40 aPK 
genes (Table 1 and Fig. 1) (table SI). Of 
these 518 protein kinases, 24 are absent from 
the public Genpept database, and 47 more are 



published only as hypothetical proteins or are 
not described as kinases. Many more are 
annotated only by automatic methods, or are 
fragmentary sequences and have not been 
individually studied. Most new kinases come 
from new and little-studied families, as tar- 
geted cloning has previously identified most 
members of well-known families. However, 
new members were found even in some of the 
best studied kinase families. One new mem- 
ber of the cyclin-dependent kinase (CDK) 
family was found: CDKl I is a close paralog 
of CDK8 (91% protein sequence identity for 



TKL 




CAMK 



Fig. 1. Dendrogram of 491 ePK domains from 478 genes. Major groups (Table 1) are labeled and 
colored. For group-specific and comparative genomic trees, see www.kinase.com/human/kinome. 
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most of their length), a kinase that interacts 
with cyclin C and RNA polymerase II (12). A 
CDKII ortholog exists in mouse, but fly 
{Drosophila melanogaster), worm (Caeno- 
rhabditis elegans), and yeast (Saccharomyces 
cerevisiae) have only a single member of this 
CDK8/CDK11 family. The Nek (NimA- 
related kinase) family is also thought to have 
a role in the cell cycle; we discovered four 
new Neks to bring the human total to 1 1 Nek 
kinases. Within the mitogen-activated protein 
kinase (MAPK) cascade, we found two new 
Stell/MAP3K (MAP kinase kinase kinase) 
and two new Ste20/MAP4K (MAP kinase 
kinase kinase kinase) genes, all of which have 
restricted expression that may explain their 
failure to be previously cloned. For instance, 
only 14 ESTs are known from MAP3K8, and 
all but one derive from testis, lung, or brain 
libraries, indicating that these new genes may 
have evolved to mediate specialized roles in 
selected tissues. 

Classification and Phylogeny of tlie 
IHuman KInome 

To compare related kinases in human and 
model organisms and to gain insights into 
kinase function and evolution, we classified 
all kinases into a hierarchy of groups, fami- 
lies, and subfamilies. This extends the Hanks 
and Hunter (75) human kinase classification 
of five broad groups, 44 families, and 51 
subfamilies by adding four new groups, 90 
families, and 145 subfamilies (Table 1 and 
Fig. 1) (table SI). Kinases were classified 
primarily by sequence comparison of their 
catalytic domains (JO), aided by knowledge 
of sequence similarity and domain structure 
outside of the catalytic domains, known bio- 
logical functions, and a similar classification 
of the yeast, worm, and fly kinomes (4). 

Of the four new groups, STE consists of 
MAPK cascade families (Ste7/MAP2K, Stel 1/ 
MAP3K, and Ste20/MAP4K). The CKl group 
contains CKl, TTBK (tau tubulin kinase), and 
VRK (vaccinia-related kinase) families. TKL 
(tyrosine kinase-like) is a diverse group of fam- 
ilies that resemble both tyrosine and serine- 
threonine kinases. It consists of the MLK 
(mixed-lineage kinase), LISK (LIMK/TESK), 
IRAK [interleukin-1 (IL-I) receptor-associated 
kinase], Raf, RIPK [receptor-interacting protein 
kinase (RIP)], and STRK (activin and TGF-(J 
receptors) families. Members of the RGC (re- 
ceptor guanylate cyclase) group are also similar 
in domain sequence to tyrosine kinases. 

Phylogenetic comparison of the human ki- 
nome with those of yeast, wonn, and fly (4) 
confirms that most kinase families are shared 
among metazoans and defines classes that are 
expanded in each lineage. Of 189 subfamilies 
present in human, 51 are found in all four 
eukaryotic kinomes, and these presumably 
serve functions essential for the existence of a 
eukaryotic cell. An additional 93 subfamilies 



are present in human, fly, and worm, implying 
that these evolved to fulfill distinct functions in 
early metazoan evolution. Comparison with the 
draft mouse genome indicates that more than 
95% of human kinases have direct orthologs in 
mouse; additional orthologs may emerge as that 
genome sequence is completed. 

The functions of human kinases can be 
inferred from family members in model 
organisms. For instance, the BRSK (brain- 
selecfive kinase) family has two uncharacter- 
ized human members that are selectively ex- 
pressed in brain. They are orthologous to 
worm SAD-1, which has a role in presynaptic 
vesicle clustering (14), suggesting a con- 
served function. A highly conserved ascidian 
(chordate) homolog is also expressed in neu- 
ral tissue and is asymmetrically localized to 
the posterior end of the embryo, suggesting a 
second role in embryonic axis determination 
(J 5). Conversely, we identified four families 
with orthologs in human, fly, and worm 
where no functional data are available for any 
member. Their phylogenetic distribution 
hints at roles fundamental to metazoan biol- 
ogy of which we are still ignorant. 

The human genome has approximately twice 
as many kinases as those of fly or worm, after 
idiosyncratic worm-specific expansions are 
trimmed (4). Accordingly, most kinase families 
have twice as many human members as they 
have in worm or fly. However, the expansion is 
not uniform: 25 subfamilies — including CDK5, 
CDK9, and Erk7 — have just one member in 
each organism, indicating critical unduplicated 
functions. Conversely, substantial human ex- 
pansions occurred in several families, with the 
most striking example being Eph family recep- 
tor tyrosine kinases (RTKs), where there are 14 
genes in human and only 1 in fly and worm 
(Table 2). These expanded families function 
predominantly in processes that are more ad- 
vanced in human, such as the nervous and im- 
mune systems, angiogenesis, and hemopoiesis, 
as well as functions that are less obviously 
enhanced, such as apoptosis, MAPK signaling, 
calmodulin-dependent signaling, and epidermal 
growth factor (EOF) signaling. 

Fourteen families are found only in hu- 
man. The Tie family of RTKs are expressed 
in endothelial cells and function in angiogen- 
esis, and the Axl RTKs (Axl, Mer, and Ty- 
ro3) function in both hemopoietic and neural 
tissues. The Trio and RIPK families have 
invertebrate homologs that lack kinase do- 
mains. They are involved in muscle function 
and apoptotic signaling via tumor necrosis 
factor (TNF), Fas, and NF-kB, respectively. 
Lmr, NKF3, NKF4, NKF5, and HUNK are 
novel families whose functions are largely 
unknown, and BCR, FAST, Gil, HI I, and 
DNAPK are atypical kinases. 

The human expansions of many of these 
families can be traced both to large duplica- 
tions of multigene loci ("paralogons") and to 



local tandem duplications of smaller loci of- 
ten containing just one gene. This supports 
recent findings that vertebrate genome com- 
plexity may derive from ancient large-scale 
duplications as well as a continuing series 
of smaller scale duplications (16-18). For 
instance, each of the four human epidermal 
growth factor receptors (EGFRs) maps 
close to one of the four HOX clusters, 
implying that the proposed double duplica- 
tion of that cluster early in vertebrate evo- 
lution created the EGFR family from a 
single ancestral EGFR gene (19). Similarly, 
the eight genes of the VEGFR and PDGFR 
(vascular endothelial growth factor and 
platelet-derived growth factor receptors) 
families map to three of the four paraHOX 
clusters, and they probably derive from 
duplications of the single ancestral paraHOX 
locus as well as local duplications within the 
paraHOX loci (table S3). The common an- 
cestry of PDGFR and VEGFR families is 
supported by the Drosophila kinome, which 
contains two genes whose sequences are in- 
termediate between those two families (4). 

We mapped all kinase genes to chromo- 
somal loci to look for origins of kinase ex- 
pansions and to link kinases with known 
disease loci. The map was created using the 
Celera and public genome assemblies and 
literature references (table S2). Although the 
overall kinase distribution is similar in den- 
sity to that of other genes, many pairs of 
closely related genes from the same families 
map closer to each other than expected by 
chance, indicating that they may have arisen 
through local chromosomal duplications (ta- 
ble S3). Seven pairs are within 30 kb of each 
other, all in tandem orientation. Another six 
pairs are within 1 Mb of each other, and 15 
more within 10 Mb. In all, 66 genes map 
unusually near to close paralogs, indicating 
that at least 6% of kinases may have arisen by 
local duplications. Most of these genes are 
from families that are highly expanded in 
human compared with worm and fly, fiirther 
supporting a recent origin. The multigene 
duplications are thought to have arisen most- 
ly during early vertebrate evolution, but some 
local duplications may also have happened at 
this time. For instance, the clustering of 
PDGFRP and CSF-I receptor (c-fms) genes 
is conserved in pufferfish (20). 

Chromosomal Mapping and Disease 

The knowledge of the exact chromosomal lo- 
cations of genes afforded by the complete hu- 
man genome assemblies is increasingly valu- 
able in pinpointing candidate disease genes 
within loci that are associated with specific 
diseases. Comparison of the kinase chromo- 
somal map with known disease loci indicates 
that 164 kinases map to ampi icons seen fre- 
quently in tumors (21) and 80 kinases map to 
loci implicated in other major diseases (table 
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S2). Although each locus covers many genes, 
these data provide entry points for studying 
both the function of these kinases and their 
potential as the causative principle of these 
diseases. The role of kinases as biological 
control points and their tractability as drug 
targets make them attractive targets for dis- 
ease therapy. 

Catalytically Inactive Kinases 

Several ePK domains are known to lack kinase 
activity experimentally, and these have been 
postulated to act as kinase substrates and scaf- 
folds for assembly of signaling complexes (22- 
24). Our sequence analysis shows that 50 hu- 
man kinase domains lack at least one of the 
conserved catalytic residues (Lys^°, Asp^^^, 
and Asp"*^) (table S5) and are predicted to be 
enzymatically inactive. Twenty-eight inactive 
kinases belong to families where all members 
are inactive in human, fly, and worm, and even 
in yeast. Thus, surprisingly, nearly 10% of all 
kinase domains appear to lack catalytic activity. 
However, these domains are otherwise well 
conserved and are likely to maintain the typical 
kinase domain fold. This suggests that this do- 
main can have generalized noncatalytic func- 



tions; it is also possible that they use a modified 
catalytic mechanism that does not require these 
residues. This has been shown for the Wnk 
family, where Lys*^ is thought to replace Lys^° 
in adenosine triphosphate (ATP) binding (25). 

The 50 "inactive" kinase domains fall into 
three main categories. First are domains that 
may act as modulators of other catalytic do- 
mains. GCN2 and JAK (Janus kinase) family 
kinases have dual ePK domains, one of which 
is inactive and may regulate the active do- 
main {26), Similarly, the inactive ePK do- 
main of receptor guanylate cyclases (RGCs) 
is thought to regulate the activity of the 
neighboring guanylate cyclase domain, in a 
manner that is modulated by ATP binding 
and phosphorylation (27). 

Second are other kinases with high similar- 
ity to the canonical ePK domain profile. These 
include the Ras pathway scaffold proteins KSR 
(kinase suppressor of Ras) {23) and the previ- 
ously undescribed KSR2, titin, ILK (integrin- 
linked kinase), PSKli2 (protein serine kinase 
H2), and unpublished kinases from the STLK 
and Trbl families. The scaffold protein CASK 
(calcium/calmodulin-dependent serine kinase) 
contains an inactive protein kinase domain and 



an inactive guanylate kinase domain, both of 
which act as protein-protein interaction do- 
mains {28, 29), This group also contains several 
RTKs where an inactive kinase may dimerize 
with and act as a substrate of another RTK: 
Ryk, CCK4, the ephrin receptors EphAlO and 
EphB6, and EitB3 {24). 

Third is a group whose members have 
very weak similarities to the kinase domain 
profile, and may have quite divergent fiinc- 
tions. Of 37 "weak" kinase domains (whose 
kinase HMM E-value score is greater than 
le-30), 26 lack one or more catalytic resi- 
dues. Note, however, that other weakly scor- 
ing kinases have been shown experimentally 
to have catalytic activity, including Bubl (e- 
11 E value), VRKl (e-10), PRPK (e-5), and 
haspin (e-3) {30-33). 

Other Functional Domains in Protein 
Kinases 

Most protein kinases act in a network of 
kinases and other signaling effectors, and are 
modulated by autophosphorylation and phos- 
phorylation by other kinases. Other domains 
within these proteins regulate kinase activity, 
link to other signaling modules, or subcellu- 
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larly localize the protein. We identified 83 
additional types of domain present in 258 of 
the 5 1 8 kinases, using profiles from the Pfam 
HMM collection (Table 3). In general, mem- 
bers of the same kinase family have the same 
domain structure, but some domain shuffling 
is seen, where individual members of fami- 
lies have gained or lost a domain and so may 
have altered function. For instance, the death 
domain is found in all foxir IRAK kinases as 
well as in single members of the DAPK and 
RIPK families. 

The most common domains mediate inter- 
actions with other signaling proteins: 24 kinases 
contain Src homology 2 (SH2) domains that 
bind to phosphotyrosine residues; other domains 
link to small guanosine triphosphatase (GTPase) 
signaling (38 kinases with RhoGEF, RhoGAP, 
RBD, PBD, RGS, CNH, HRl, or TBC do- 



; Protein kinase C terminal domain 44 
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mains), lipid signaling (42 kinases with 
DAG_PE, C2, PX, or PH domains), and calci- 
um signaling (28 kinases with CaM, IQ, or 
OPR/PBl domains); target the protein to the 
cytoskeleton (seven kinases with spectrin, cofi- 
lin, myosin head, or FCH domains); or mediate 
interactions with other proteins (46 kinases: 
Death, SH3, SAM, LIM, or ankyrin domains) or 
RNA (three kinases with RRM, DSRM, and 
putative RNA binding Tudor domains). Most of 
the domains found in new or extended sequenc- 
es are the same as those already seen in other 
family members, but sonie unpredicted domains 
are found, such as the previously unpublished 
leucine-rich repeat kinase (LRRK) family, con- 
taining arrays of leucine-rich repeats, as well as 
amiadillo and ankyrin repeats. 

Most of the 58 RTKs, 12 receptor serine- 
threonine kinases, and five receptor guanylate 
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cyclases also have recognizable ligand-binding 
and other extracellular domains, along with 
clear signal peptides and transmembrane re- 
gions. Several nonreceptor tyrosine kinases are 
also targeted to the membrane by lipidation or 
protein-protein interactions. Three kinases are 
targeted to the endoplasmic reticulum, five or 
six are likely to be mitochondrial, and most of 
the rest are thought to be cytoplasmic, nuclear, 
or both. 

Two hundred and sixty kinases contain no 
additional Pfam domains. Many are small 
proteins containing little more than an ePK 
domain and may be controlled by additional 
regulatory subunits, such as cyclins, which 
control CDK activity. Others contain con- 
served sequences that have not yet been clas- 
sified as domains and whose functions are 
unknown. 

Thirteen kinases have dual ePK domains, 
in which both domains appear to be active 
[six ribosomal S6 kinase (RSK) family ki- 
nases and two Trio family kinases] or the 
second domain is inactive (the four JAK fam- 
ily kinases and GCN2). The two RSK do- 
mains are involved in a kinase relay: Erk 
phosphorylates and activates the CAMK- 
group domain of RSK2, leading to autophos- 
phorylation on a linker region that then al- 
lows PDKl to phosphorylate and activate the 
second AGC-group kinase domain (34). 

Kinase Pseudogenes 

The genome also contains many nonfunctional 
copies of kinase genes that are not expressed or 
encode degenerate, truncated proteins. These ki- 
nase pseudogenes are derived mostly from ret- 
roviral transposition and genomic duplications. 
Pseudogenes can confuse gene predictions, 
cross-hybridize with probes for functional 
genes, and contribute to disease by homologous 
recombination with their parental genes (35, 
36). We identified 106 pseudogenes containing 
similarity to the ePK domain or to an aPK (table 
84); several other pseudogene fragments that 
lack a kinase domain were found but are not 
included here. All but two pseudogenes 
have open reading frames (ORFs) interrupt- 
ed by stop codons or frameshifts, which 
were verified by multiple independent se- 
quence sources. These ORFs typically have 
high protein sequence similarity to a func- 
tional ("parent") kinase; most are partial 
gene fragments. The two putative pseudo- 
genes with complete ORFs (ClC2a-rs and 
STLK6-rs) lack introns and obvious pro- 
moters, are absent from EST databases, 
have >98.5% DNA sequence identity to 
their parents, and contain remnants of 
polyA tails in their genomic sequences. 
They are probably young processed pseudo- 
genes whose sequences have not yet 
diverged. 

Seventy-five kinase pseudogenes lack in- 
trons. Some are duplications of intronless genes 



Table 3. Most common Pfam domains in protein kinases. See table S7 for a fuller listing, 
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or of single exons of larger genes, but most 
appear to derive from viral retrotransposition of 
a processed transcript. Additionally, some in- 
tron-containing pseudogenes such as AurAps2 
contain some parental introns but lack others, 
and may result from retrotransposition of a 
partially spliced transcript. 

Twenty-nine kinase pseudogenes contain 
clear introns and probably arose by genomic 
duplication. In some cases, these are part of a 
large duplicon (2, 5) containing multiple du- 
plicated genes. Such cases include two p70 
ribosomal protein S6 kinase (p70S6K) pseu- 
dogenes, which appear to arise from intrach- 
romosomal duplications of the p70S6K locus. 
These duplications are 20 kb and 70 kb in 
length, and are 90 to 95% identical in DNA 
sequence to the original locus. 

A few pseudogenes have no obvious hu- 
man parent but have functional orthologs in 
rodents and probably indicate the decay of 
previously ftinctional genes. They include the 
polo-like kinase SGK384ps, whose mouse 
ortholog is intact, and the human orthologs of 
rat guanylate cyclases CGD and KSGC. 

Although pseudogenes appear to be evo- 
lutionary relicts, some may have some resid- 
ual or cryptic fimction. Many pseudogenes 
are transcribed: 26 kinase pseudogenes are 
seen in cDNA and EST databases (table S4), 
some represented by as many as 50 ESTs. 

The prevalence of pseudogenes varies great- 
ly between kinase famihes (Table 1) (table S4). 
The MARK (microtubule affinity-regulating ki- 
nase) family kinases displays the largest ratio of 
pseudogenes to functional genes (28/4), fol- 
lowed by p70S6K (4/1), Erk3 (4/1), phospho- 
rylase kinase 7I (3/1), and casein kinase lot 
(3/1). Frequent copying of a gene by retroviral 
insertion might indicate a functional role for the 
gene in retroviral function, but no viral function 
or source for MARK genes is yet known. 

Comparison with Sequence Databases 

We compared our nonredundant set of cloned 
and predicted kinase protein sequences with the 
published predictions from Celera and public 
genome projects {2. S) and with a recent release 
of the public GenPept database (JO). Figure 2 
shows die extent to which the best match in each 
database agrees with our sequences. All three 
databases contain at least fragments of most 
kinases, but fer fewer genes are in perfect agree- 
ment. In many cases the public sequences come 
from partial clones that lack the NHj- or 
COOH-termini (43 and 15 genes, respectively), 
often from large-scale sequencing projects that 
do not individually annotate sequences. In other 
cases, the public sequence has overextended the 
true start site where upstream stop codons are 
absent. We used similarity to rodent orthologs to 
trim sequences to a strongly predicted transla- 
tional start site in nine cases. Other discrepan- 
cies come from sequencing errors, alternative 
splicing, and sequencing of partially spliced 



cDNAs. In all cases, our unique sequence is 
supported by strong sequence similarity to ho- 
mologs or by cDNA cloning. 

In some cases, our additional sequence 
greatly changes the predicted function of a 
gene, such as the addition of a predicted 
signal peptide to the Lmrl tyrosine kinase; 
the previously published form of this gene 
(AATYK) was based on a cDNA lacking this 
domain, which created a cytoplasmic protein 
(57). We also identified full-length forms of 
two related new genes, Lmr2 and Lmr3, 
which together form a new family of predict- 
ed receptor tyrosine kinases with vestigial 
extracellular regions. Their biological roles 
are currently under investigation. 

Gene predictions from the public genome 
project (EnsembI) and Celera differ from those 
we obtained largely as a result of misprediction 
of exon boundaries and splitting of single genes 
into multiple predicted genes. EnsembI incor- 
porates public sequence data from RefSeq and 
Swiss-Prot, giving perfect agreement with our 
sequences for many genes. The distance be- 
tween the GenPept and EnsembI traces in Fig. 2 
indicates the extent of recent new sequence 



tains multiple sequences for most kinases, 
many of which are partial fragments or contain 
multiple sequencing errors. It also contains chi- 
meric genes such as the nonexistent zona pel- 
lucida kinase (38). The proliferation of different 
names for the same kinase adds to the problem 
of creating an accurate nonredundant list of 
kinases. EnsembI and Celera predictions in- 
clude several pseudogenes (36 and 29, respec- 
tively), and also aimotate as kinases a number 
of genes that are homologous to noncatalytic 
regulatory subunits of protein kinase complexes 
or to kinases other than protein kinases. 

All 5 1 8 kinases are found in at least one of 
the expressed sequence databases (dbEST, 
Incyte, and GenBank cDNAs), indicating that 
all are genuine, transcribed genes. Many ki- 
nases are expressed in low amounts in a 
restricted distribution, so the presence of all 
kinases in EST or cDNA databases implies 
that these databases contain fragments of 
most human genes. 

Summary 

The sequencing of the human genome has pro- 
vided a starting point for the identification of 
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Fig. 2. Comparison of our kinase protein sequences with the best matches in Celera, EnsembI, and 
GenPept databases. Each point shows the number of genes for which the percentage difference 
between our sequence and the database is greater than the value indicated. Insert table indicates 
number of sequences where differences between our sequence and closest database match is >Z%, 
>S0%, or >95%. 



publication from large-scale cDNA sequencing 
projects and individual cloning driven by 
genomic data. The Celera predictions were en- 
tirely computational, and so have very few per- 
fect predictions. However, for genes not present 
in public databases, many Celera predictions 
agree better with our sequences than those from 
EnsembI (not shown). 

A comparison with "known" protein ki- 
nases encounters several problems with over- 
and under-classification of genes as kinases, as 
well as with partial sequences. GenPept con- 



most, if not all, human members of the eukary- 
otic protein kinase superfamily, and many atyp- 
ical kinases. We used the published human 
genome sequences, combined with other se- 
quence databases and directed cloning and se- 
quencing of individual genes to discover, ex- 
tend, or correct 125 kinase gene sequences, and 
define a nonredundant set of 5 1 8 human protein 
kinase genes. This set accounts for almost all 
human protein phosphorylation and collective- 
ly mediates most cellular signal transduction 
and many other processes. Comparative se- 
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quence analysis and mapping predict function 
and possible disease association for many 
kinases, and give clues to their evolutionary 
origin. Comprehensive kinome-scale ap- 
proaches are now feasible, including RNA 
and protein expression profiling, and high- 
throughput functional assays using constitu- 
tively active and dominant-negative kinase 
constructs. These will facilitate the study of 
the role of kinases in a wide range of biolog- 
ical processes, and the development of selec- 
tive inhibitors and activators for research and 
therapeutic purposes. 

This large and well-curated sequence set 
also casts a light on the current state of 
human genome analysis. All 518 genes are 
covered by some EST sequence, and —90% 
are present in gene predictions from the 
Celera and public genome databases, al- 
though those predictions are often fragmen- 
tary or inaccurate and are frequently misan- 
notated (39). 
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Cheaper Chips Find a Good Fit with Hit Validation 
Drug Discovery & Development - February 03, 2005 

Protein microarrays promise to facilitate study of proteome interactions, and their use is growing despite proteins' inherent 
instability 

Gina Shaw 

Shaw is a contributing editor based in Marlboro, N J. 

In November 2004, protein array technology hit a milestone: Invitrogen Corp., Carlsbad, Calif, released the world's first commercially 
available high-density human protein microarray. The ProtoArray, which contains more than 1,800 unique human proteins, represents a 
aoss-section of gene families including pharmaceutically relevant protein classes such as kinases and membrane-associated, celi- 
signaling, and metabolic proteins. 

The pharmaceutical industry's demand for protein arrays has been high, and for good reason, says Steven Bodovitz. PhD, principal and 
cofounder of BioPerspectives, San Francisco, and an expert on protein biochips, protein biomarkers, and proteomics. "Mass 
[spectrometric] studies are very good at discovering potential targets and biomarkers, but they're not so good at following up with 
validation," he says. Tou need to be able to take your initial findings and then see the protein change, and follow up and test it over and 
over again in different tissues, at different time points, under different conditions. To achieve that, you need a lower-cost, higher- 
throughput screening technology, which is a fantastic fit for the protein biochip." 

Indeed, some experts have long been saying that protein an-ays could potentially surpass DNA microarrays in their scientific impact. But 
moving from expectation to reality with these arrays has proven to be a longer journey than first imagined when protein arrays came on 
the scene in the late 1990s. "It was first thought that protein biochips would just be an extension of DNA microan-ays, and that hasn't 
exactly panned out," says Bodovitz. 

That's because proteins have proven to be much trickier to work with in array format than 
their genomic counterparts. First of all, there are issues of stability. Membrane proteins, for 
example, make up the majority of potential drug targets, but they're particulariy challenging to 
stabilize. Then there's the choice of immobilization technique, which determines how well the 
target protein presents itself to the capture agent, and the problem of nonspecific binding. 
And of course, proteins are inherently unstable outside their natural habitat of living cells, 
making them much more challenging than DNA to tag and manipulate. 

Despite these challenges, though, the protein array market continues to grow. What was a 
$122 million market in 2002 will jump to $545 million by 2008, predicts an August 2004 
report, Protein Biochips: Parallelized Screening for High-Output Biology. The report was 
released jointly by BioPerspectives; Bachmann Consulting in Nesoddtangen, Norway; and 
the NMI Natural and Medical Sciences Institute at the University of Tubingen, Germany. "The 
industry has begun to make the transition from a few years ago, where there were a lot of 
grandiose expectations, to very specific, aggressive approaches to developing protein 
biochips," says Bodovitz. 

Although Invitrogen has lately been methodically gobbling up competitors and was the first 
company to offer a human protein biochip, the market for protein arrays is unlikely to be 
nailed down by a few leading vendors in the way that Affymetrix virtually cornered the martlet 
on DNA microarrays. "The genome is basically a limited set of information. Once you have a 
DNA microarray that covers the whole human genome, there is not a lot of room for 
something else," says Bodovitz. "That won't be the case at all with protein biochips. You have 
the capture and the interaction sides, which are very different technologies, and no one's yet 
covering the whole proteome, so no one big company is dominating." 




click the image to enlarge 



Comparative analysts of normal and 
cancer sera using the Schleicher & 
Schuell Serum Biomarker Chip. 
Scatter plots of protein abundance 
ratios comparing serum from 
breast cancer patients with serum 
from healthy, age- and gender- 
matched individuals. Serum 
proteins were labeled with Biotin- 
ULS and Fluorescein-ULS, pooled, 
and probed against the Serum 
Biomarker Chip. (Source: 
Schleicher & Schuell) 



Bright Shiny Beads 

With all of the 
challenges inherent in 
developing solid-surface 
arrays that can hold 
thousands of proteins, 
all with different 
properties, it's little 
wonder that another 
approach — protein- 
interaction assays in 
solution — is also 
drawing attention. A 
leader in this market is 
Luminex Corp., Austin, 
Texas, whose bead- 
based xMAP system can 
provide up to 100 assay 
results from a single 
drop of sample. 
Like the planar arrays. 




click the image to 
enlarge 

Luminex*s bead- 
based protein 
arrays attach 
reagents to the 
surface of color- 
coded beads, 
which can be 
studied in 
suspension at a 
rate of 100 



A capture audience 

What's the best approach to protein arrays? It depends on whom you ask, and what 
your goal is. Protein arrays can generally be broken down into two main categories: 
capture an-ays and interaction arrays. Capture biochips use immobilized capture 
agents to capture target proteins, while interaction biochips have immobilized proteins 
or peptides which are used to identify functions or tease out interactions with other 
proteins or small molecules. 

"For capture arrays, the big issue is content. In DNA microarrays, the oligo on the 
surface captures the target sequence and binds to it, but to capture a protein, you 
need a different, more complicated agent," says Bodovitz. "Historically, the problem 
has been that there aren't enough high-quality capture agents to populate a broad or 
high-density protein biochip platform. Antibodies don't work if they are denatured, for 
example." 

Early solutions that appeared to have promise for rapidly generating research reagent 
antibodies don't appear to have panned out, so many companies are nan-owing their 
sights to more focused content. "You organize a limited number of antibodies, say 30 
or 40, maybe, sometimes more. The key is to give it a focus," says Bodovitz. 'The 
most common focus is on cytokines. It's possible to cover a large number of the 
cytokine population on a biochip." 
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targets within a 
few seconds. 
(Source: 
Luminex) 



Luminex's protein 
arrays also attach 
capture reagents to the 
surface, in this case, a 
bead surface. Each 
bead Is color-coded to distinguish what 
reaction is being demonstrated. 
The nnajority of protein array tools out there 
use a flat-surface, two-dimensional array. We 
try to mimic as well as possible what's actually 
happening in the cell. Biology doesn't happen 
in a tube," says James Jacobson, PhD, 
Luminex's vice president of research and 
development. "The beads allow a lot of things 
to happen that are advantageous. Because 
they're small and in suspension, we can take 
advantage of very favorable reaction kinetics. 
We also have a very high sample throughput; 
100 results in a few seconds is pretty fast." 
Luminex has moved away from direct array 
sales and instead distributes them through a 
range of partnerships with companies like Bio- 
Rad Laboratories, Hercules, Calif,, and Qiagen, 
Valencia, Calif., which provide add-on offerings 
like cytokine assays and assays for apoptotic 
markers and specific phospho-proteins, 
associated software, and reagents and kits 
that allow users to develop their own assays 
for the multiplexed environment. 
That is an environment that interests many 
pharmaceutical scientists. "Multiplexing in 
protein arrays is promising, especially for cell 
signaling and signal transduction/' says Eli 
Lilly's Myrtle Davis, DVM, PhD. "In those 
arenas, it's all about pathways, not just a 
single protein. We need to be able to identify a 
set of markers that indicate a particular 
modulation, not just one protein, and for that 
you need to see multiple protein interactions 

at one time." 



"The best arrays now only have 100 antibodies or less," says protein array expert 
Michael Snyder, director of the Yale Center for Genomics and Proteomics, New 
Haven, Conn. "I think it will take some new technologies to measure thousands of 
things using antibody arrays or other sorts of capture reagents. I don't think v\« know 
yet what are the best capture reagents, [be they] aptamers, monoclonal antibodies, or 
single-chain antibodies." Snyder thinks there are ways of improving the specificity of 
those reagents, ways that have yet to be explored. 

One of the leaders in the capture array market is Schleicher & Schuell (S&S), Keene, 
N.H., which in June 2004 released its S&S Seaim Biomarker Chip, the first high- 
density antibody chip specific to serum biomarkers related to human cancers of every 
major organ. The chip uses a cisplatin labeling chemistry that tags small molecules to 
serum proteins to label each sample with two different tags. The samples are then 
pooled and prot)ed against the antibody microarray in a competitive binding fashion. 

"The translational cancer research community didn*t really have an easy-to-use tool 
that did not require specialized training, so the serum biomarker chip was probably the 
first product that addressed that need. Researchers can now identify a pattern of the 
abundance of protein in a diseased individual versus a matched healthy person, and, 
for example, identify a molecular signature that acts as a surrogate end point rather 
than a clinical end point, 12 to 16 weeks before you would see tumor response to a 
therapy/' says business development manager Robert Negm, PhD. 

As little as 8 mL can be processed to discriminate the abundance of more than 120 
cancer serum biomarker proteins between two individuals. S&S has taken great pains 
to eliminate nonspecific binding, says Negm, using its FAST Quant TH1/TH2 assay 
(an alternative to microplate ELISAs for the assaying of multiple cytokines) to 
demonstrate that the antibody is not binding nonspecifically to another protein in the 
serum sample. 

Stilt, their antibody-based nature remains the biggest limitation for capture arrays, 
says Myrtle Davis, DVM, PhD, senior research scientist at Eli Lilly and Co., 
Indianapolis. "Antibody-protein interaction is a wonderful thing that we can exploit as a 
means to pull out proteins from complex mixtures, but we do know that it's extremely 
limited." 



At this point, she says, the quality 
of the protein array depends on 
the quality of the antibody. "One 
of the questions about antibody 
arrays that I always ask every 
vendor is this: if they use a 
chemistry to bind antibody to a solid phase, for example, how can they be sure that 
the antibody is bound in a conformation that allows it to capture antigens?" she asks. 
"You want to have some quality control on well-to-well variation." 

Like Snyder, she thinks improved capture reagents are an important goal. "We need 
some other capture chemistries to be defined, so that these arrays can start to be 
more useful to the protein community." 

Also on Davis' wish list is a protein binding technology that is less vendor-specific. 
"Reagents are often very specific to the vendor, for example, and we've found that 
when you start to employ some of these technologies, you're tied to the vendor." she 
says. "If a company goes under, you've set up an entire assay system around a 
technology you can no longer use." 

Interaction in action 

In temns of interaction arrays, Invitrogen appears to be unchallenged. Within the last 
year. Invitrogen acquired Protometrix, Branford, Conn., the developer of the Yeast 
ProtoAray. precursor to the Human ProtoArray. It also licensed rights to specific fields 
of use for more than 30 patents in the area of protein microarray development from 
Zyomyx Inc. 

Interaction an-ays are "the wild card," says Bodovitz. "If you talk to most protein 
biochemists, their reaction is, 'This can't possibly work.* Protein biochemistry is 
notoriously finicky, and people usually highly optimize any one reaction they're 
studying. Now you're talking about doing thousands of biochemical reactions on a 
chip surface? First, you're only studying them under one condition, plus immobilization 
could also have an effect. This means your data could only be applicable to one set of 
condition, and it may not be representative of anything." 

He put this question to Protometrix scientists before the Invitrogen acquisition. "They 
countered that when they compared the reactions on the chip versus taking proteins 



Table of No Content 

If you had a protein chip with no proteins on 
it, then you'd have the ProteinChip system 
from Ciphergen Biosystems Inc., Fremont, 
Calif. The ProteinChip array consists of a 
variety of preactivated, chemically treated 
surfaces, designed for expression profiling 
when you're not sure just what protein you're 
looking for. Its main application is biomarker 
discovery and assay. 
The ProteinChip, says Kate Gilbert, 
Ciphergen's director of marketing, is ideally 
suited to de novo discovery. "A researcher 
may be looking at a disease biomarker or 
efficacy biomarker and trying to predict 
response, and in many cases, may not be 
really sure what protein is going to prove to be 
a good marker," Gilbert says. "If you have 
antibodies down there, you're only going to 
find the proteins you have antibodies for. This 
approach, on the other hand, allows you to 
discover any type of protein, providing that it 
will bind under the chromatographic conditions 
you've selected." 

The ProteinChip uses broad conditions in order 
to capture as much of the proteome as 
possible. The process is simple, designed as a 
benchtop system usable by individual 
researchers. First, a biological sample is put on 
the chip, and subpopulations of proteins are 
captured, retained and purified directly on the 
chip by affinity capture. The ProteinChip 
Reader uses a laser to desorb the retained 
proteins into a time-of-flight mass 
spectrometer; and accompanying software 
records and presents the molecular weight of 
the proteins found. 
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individually and doing it in solution, they got the same kind of results." he says. .^^^^ ^^ips allow you to segregate proteins and 

"Immobilization, at least, has been controlled for and apparently has very little impact ^^^^^^ wouldn't 

on results. If you can make an initial discovery through a fast, very broad screen, and looked at one sample/ says 

it holds up in subsequent assays, that's a very powerful method/ 

Pharmaceutical scientists are also attracted to Invitrogen's interaction arrays for Gilbert, 
specificity profiling. "The techniques currently used to assess antibody specificity are 

relatively caide. Western blots and such will largely characterize an antibody as specific or nonspecific, but they'll fail to identify exactly 
what the aoss-reactivity is," says Predki. "With microan-ay experiments, in about a half a day, you can not only assess specificity but can 
immediately detennine cross-reactive proteins." 
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Analysis of yeast protein kinases using 
protein chips 

Heng lhu\ James F. Klemic^'^ Swan Chang^, Paul Bertone'', Antonio Casamayor\ 
Kathryn G. Klemic"*, David Smith\ Mark Gerstein^, Mark A. Reed^'^ & Michael Snyder^'^ 

We have developed a novel protein chip technology that allows the high-throughput analysis of bio- 
chemical activities, and used this approach to analyse nearly all of the protein kinases from Saccha- 
romyces cerevisiae. Protein chips are disposable arrays of microwells in silicone elastomer sheets placed 
on top of microscope slides. The high density and small size of the wells allows for high-throughput 
batch processing and simultaneous analysis of many individual samples. Only small amounts of protein 
are required. Of 122 known and predicted yeast protein kinases, 119 were overexpressed and analysed 
using 17 different substrates and protein chips. We found many novel activities and that a large num- 
ber of protein kinases are capable of phosphorylating tyrosine. The tyrosine phosphorylating enzymes 
often share common amino acid residues that lie near the catalytic region. Thus, our study identified a 
number of novel features of protein kinases and demonstrates that protein chip technology is useful 
for high-throughput screening of protein biochemical activity. 



Introduction 

The sequencing of entire genomes has resulted in the identification 
of large numbers of novel ORFs. The challenge ahead is to gain 
information about the function of identified genes* Currently, 
significant effort is devoted to understanding gene function by 
rnRNA expression patterns and by gene disruption phenotypes^-^. 
Important advances in this effort have been possible, in part, by the 
ability to analyse thousands of gene sequences in a single experi- 
ment using gene chip technology. Much information about gene 
function comes from the analysis of the biochemical activities of the 
encoded protein. Currendy, these types of analyses are done by 
individual investigators studying a single protein at a time. This can 
be time consuming because it can take years to purify and identify a 
protein on the basis of its biochemical activity. The availability of an 
entire genome sequence makes it possible to perform biochemical 
assays on every protein encoded by the genome. As such, it would 
be extremely powerful to analyse hundreds or thousands of protein 
samples using a single protein chip. Such approaches lend them- 
selves well to high-throughput experiments in which large amounts 
of data can be generated and analysed. 

Several groups have devised methods for expressing large num- 
bers of proteins with potential utility for biochemical genomics in S. 
cerevisiae. InVitrogen has cloned ORFs into an expression vector 
that uses the GAL promoter and fuses the protein to a HISX6 tag; 
thus far they have prepared and confirmed expression of approxi- 
mately 2,000 yeast protein fusions^ Using a recombination strategy, 
Eric Phizicky's group has cloned approximately 85% of the yeast 
ORFs into a vector that produces GST fusion proteins inder the 
control of the CUPl promotor (inducible by copper^). Using a 
pooling strategy, they identified the gene encoding several impor- 
tant biochemical activities (for example, phosphodiesterase and 
Appr-l"-P-processing activities). Strategies to analyse large num- 
bers of individual protein samples have not been described. 

We have also overproduced yeast proteins as GST fusions and 



developed a protein chip technology suitable for rapidly analysing 
large numbers of samples; this approach was applied to the analysis 
of nearly all yeast protein kinases. The yeast genome has been 
sequenced and contains approximately 6,200 ORFs greater than 
100 codons in length. Of these, 122 are predicted to encode protein 
kinases, and 24 of these protein kinase genes have not been studied 
previously^. Except for two histidine protein kinases, all of the yeast 
protein kinases are members of the Ser/Thr family; tyrosine kinase 
family members do not exist, although seven protein kinases that 
phosphorylate serine/threonine and tyrosine have been reported^. 

Here we overexpress nearly all (1 19) of the yeast protein kinases 
and used a novel protein chip technology to analyse their specificity 
using 17 different substrates. We find that 32 kinases preferentially 
phosphorylate one or two substrates, and 27 kinases readily phos- 
phorylate poly(Tyr-Glu), suggesting that there are many more 
potential tyrosine kinases than were known previously. Correlation 
of functional specificity with amino acid sequence information 
reveals that the kinases that use poly(Tyr-Glu) as a substrate contain 
amino acids near the catalytic region that are distinct from those 
that do not. We expect this technology to be valuable for the analy- 
sis of entire proteomes and the information to be very valuable to 
researchers studying kinase-substrate reactions. 

Results 

Yeast kinase cloning and protein purification 

Using a recombination-directed cloning strategy^, we cloned the 
entire coding regions of 1 22 yeast protein kinase genes in a high- 
copy expression vector (pEG(KG)) that produces GST fusion 
proteins under the control of the galactose- inducible CALl pro- 
moter^ (Fig. la). GST::kinase constructs were rescued into 
Escherichia coli, and sequences at the 5' end of each construct 
were determined. We successfully cloned 119 of the protein 
kinase genes in-frame. The three kinase genes that we did not 
clone were very large (4.5-8.4 kb) . 



Departments of ^Molecular, Cellular, and Developmental Biology, ^Electrical Engineering, ^Applied Physics, ^Cellular and Molecular Physiology, and ^Molecular 
Biophysics and Biochemistry, Yale University New Haven, Connecticut, USA. Correspondence should be addressed to M.S. (e-mail: michael.snyder@yale.edu). 



nature genetics • volume 26 • november 2000 



283 



now technology ^ ® ^QQQ Nature America Inc. • http://genetlcs.nature.com 



yeast ORF 



a 

*3 



0) 

E 
< 

o 

3 

CO 




The GST::kinase fusion proteins were overproduced in yeast and 
purified from 50-mi cultures using glutathione beads and standard 
protocols'^. For the case of Hoglp, in the last five minutes of 
induction the yeast cells were treated with high salt to activate the 
enzyme; for the rest of the kinases, synthetic media (URA7raffi- 
nose) was used, Immunoblot analysis of all 119 fusions using anti- 
GST antibodies revealed that 105 of the yeast strains produced 
detectable GST::fusion proteins; in most cases the fusions were full 
length. Up to 1 ^lg of fusion protein per millilitre of starting culture 
was obtained (Fig. 16), but we failed to detect 14 of 119 
GSTrikinase samples by immunoblotting analysis, despite repeated 
attempts. Presumably, these proteins are not stably overproduced 
in the pep4 protease-deficient strain used, or these proteins may 
form insoluble aggregates that do not purify using our procedures. 
Although this procedure was successful, purification of GST fusion 
proteins using 50-ml cultures is time consuming and is not applic- 
able for preparing thousands of samples. Therefore, we have devel- 
oped a procedure for purifying proteins in a 96-well format. Using 
this procedure, we prepared and purified 119 GST fusions in 6 
hours with approximately twofold higher yields per millilitre of 
starting culture relative to the 50-ml method. 

Protein chip design 

We developed protein chips to conduct high-throughput bio- 
chemical assays of these 119 protein kinases (Fig. 2). These chips 
consist of an array of microwells in a disposable silicone elas- 
tomer, poly(dimethylsiloxane) (PDMS; ref. 10). Microwell arrays 
allow small volumes of different analytes to be densely packed on 
a single chip, yet remain physically segregated during subsequent 
batch processing. Proteins were covalently attached to the wells 
using a crosslinker 3-glycidoxypropyltrimethoxysilane* * (GPTS) . 
Up to 8x10"^ ^ig/jim^of protein can be attached to the surface. 

For the purposes of the protein kinase assays described here, we 
configured the protein chip technology to be compatible with 
standard sample handling and recording equipment Using 



Fig. 1 Strategy to overproduce yeast 
protein kinases, a, Using the recombi- 
nation strategy^, 119 yeast protein 
kinases were cloned in a high-copy 
URA3 expression vector (pEG(KG)) 
that produces GST fusion proteins 
under the control of the galactose- 
Inducible CAL1 promoter^. GST:klnase 
constrvjcu were rescued into f. coli. 
and sequences at the 5' of each con- 
struct were determined. The whole 
procedure was repeated when muta- 
tions were discovered, b, Immunoblots 
of GSTklnase fusion proteins purified 
as described. From 3 attempts we 
purified 105 kinase proteins. In spite 
of repeated attempts, we were 
unable to detect 14 of 119 GST 
fusions by immunoblotting analysis, 
for example, Mpsip in the lane 
labelled with a star. 



radioisotope labelling ip?) , the 
kinase assays described below 
and manual loading, we tested a 
variety of microarray configura- 
tions and found that the follow- 
ing chips produced the best 
results: round wells 1.4 mm in 
diameter and 300 |im deep 
(approximately 300 nl), in a 
10x14 rectangular array config- 
uration with a 1 .8 mm pitch. We 
then made a master mold of 12 
of these arrays and repeatedly cast microarrays for the protein 
kinase analysis. Chips were placed atop microscope slides for han- 
dling purposes (Fig. 2a) ; the arrays covered slightly more than one- 
third of a standard microscope slide and we typically used two 
arrays per slide (Fig. 2b). Although we used a manual pipette 
method to place proteins in each well, automated techniques may 
also be used. In addition, this protein chip configuration may also 
be used with other tagging methods such as fluorescent antibodies. 

Large-scale kinase assays using protein chips 

All 1 19 GSTiprotein kinases were tested for in vitro kinase activity*^ 
in 17 different assays using ^^Py-ATP and the following 17 sub- 
strates: (i) the kinases themselves (autophosphorylation); (u) 
bovine histone HI (a common kinase substrate); (iii) bovine casein 
(a common substrate) ; (iv) myelin basic protein (a common sub- 
strate); (v) Axl2 carboxy terminus-GST (Axl2 is a transmembrane 
phosphoprotein involved in budding^^); (vi) Rad9 (a phosphopro- 
tein involved in the DNA damage checkpoint ^^) ; (vii) Gic2 (a phos- 
phoprotein involved in budding^^) ; (viii) Red 1 (a meiotic 
phosphoprotein important for chromosome synapsis ^^); (be) Mekl 
(a meiotic protein kinase important for chromosome synapsis* 
(x) Poly(tyrosine-glutamate 1:4) (poly (Tyr-Glu); a tyrosine kinase 
substrate*^; (xi) Ptk2 (a small-molecule transport protein^^; (xii) 
Hsll (a protein kinase involved in cell cycle regulation^^; (xiii) Swi6 
(a phosphotranscription factor involved in Gl/S control^*); (xiv) 
Tub4 (a protein involved in microtubule nucleation^^); (xv) Hogl 
(a protein kinase involved in osmoregulation^^; (xvi) Hogl (an 
inactive form of the kinase); and (xvii) GST (a control). For the 
autophosphorylation assay, the kinases were directly adhered to the 
treated PDMS wells and ^^Py-ATP was added; for substrate reac- 
tions, the substrates were bound to the wells, and then kinases and 
^Py-ATP were added. After the reactions were completed, the 
slides were washed and the phosphorylation signals were acquired 
and quantified using a high-resolution phosphoimager (Fig. 3). To 
identify kinase activiti^, the quantified signals were converted into 
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Fig. 2 Protein chip fabrication and 
kinase assays, a. Kinase activities were 
detected using protein chips. PDMS 
was poured over the acrylic mold. 
After curing, the chip containing the 
weWs was peeled away and mounted 
on a glass slide. The next step 
included modification of the surface 
and then attachment of proteins to 
the wells. Wells were blocked with 
1% BSA before kinase, "Py-ATP and 
buffer were added. After incubation 
for 30 min at 30 'C, the chips were 
washed extensively and exposed to 
both X-ray film and a phospholmager, 
which has a resolution of 50 \im and is 
quantitative. For 12 substrates each 
kinase assay was repeated at least 
twice; for the remaining 5 the assays 
were performed once, b, An enlarged 
picture of the protein chip. 




kinase assay 



substrate attached activated by GPTS 



fold increases relative to GST 
controls and plotted for further 
analysis (Fig. 4 a). 

Most (112/119; 94%) kinases 
exhibited activity fivefold or 
greater over background for at 
least one substrate (Fig. 4a). As 
expected, Hrr25p. Pbs2p and 
Mek 1 p phosphorylated their 
known substrates^'*'^^, Swi^p 
(400-fold higher than the GST 
control). Hoglp (10-fold higher) 
and Redlp (10-fold higher), 
respectively. Using this assay, we 

found that 18 of 24 predicted protein kinases that have not been 
previously studied phosphorylate one or more substrates. Several 
unconventional kinases^, including the histidine kinase YIL042c 
and phospholipid kinase Meclp, phosphorylate protein sub- 
strates in trans. 

To determine substrate specificity, the activity of a particular 
kinase was further normalized against the average of its activity 
against all substrates (Fig. 46; all data are available at http:// 
bioinfo.mbb.yale.edu/genome/yeast/chip). We found that 32 
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kinases had substrate specificity on a particular substrate with 
specificity index (SI) equal or higher than 2, and, reciprocally, 
most substrates are preferentially phosphorylated by a particular 
protein kinase or set of kinases. For example, the preferred sub- 
strates for YIL042C and Meclp were Swi6p and Axl2p. The G ter- 
minus of Axl2, a protein involved in yeast cell budding, is also 
preferentially phosphorylated by Dbf20p, Kin2p, Yaklp and 
Ste20p relative to other proteins. Previous studies found that 
Ste20p was localized at the tip of emerging buds similar to Axl2p, 

and a ste20Ada4" mutant is 
unable to bud or form fully 
polarized actin patches or 
cables^^. Another example is the 
phosphoprotein Gic2, which is 
also involved in budding 
Ste20p and Skmlp strongly 
phosphorylate Gic2p (Fig. 4/?). 
Previous studies suggested that 
Cdc42p interacts with Gic2p, 
Cla4p (ref. 28), Ste20p and 
Skmlp. Our results raise the 
possibility that Cdc42p may 
function to promote the phos- 
phorylation of Gic2p by recruit- 
ing Ste20p and/or Skmlp. 
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Fig. 3 The protein chip and kinase 
assays. Position 19 on every chip indi- 
cates the signal of negative GST con- 
trol. Mpsip at position B4 exhibited 
strong kinase activities in all 12 
kinase reactions, although no visible 
signal was detected by immunoblot 
analysis (Fig. lb). 
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Many yeast kinases phosphorylate polyOyr-Glu) 

On the basis of sequence analysis, all but two yeast protein 
kinases belong to the Ser/Thr family of protein kinases; the two 
exceptions are members of the histidine kinase family. Proteins of 
the conventional tyrosine kinase sequence family are lacking. At 
the time we started our study, however, seven protein kinases 
(Mpsl, Rad53, Swel, Ime2, Ste7, Hrr25 and Mckl) were reported 
to phosphorylate tyrosine^^. We confirmed that Swelp, Mpslp, 
ImeZp and Hrr25p readily phosphorylate poly(Tyr-Glu), but we 
did not detect any tyrosine kinase activity for Ste7p, Rad53p or 
Mckl p. Mcklp did not show strong activity in any of our assays, 
but Ste7p and Rad53p are very active in other assays. Thus, their 
inability to phosphorylate poly(Tyr-Glu) indicates that they 
either are very weak tyrosine kinases in general or are at least 
weak with the poly(Tyr-Glu) substrate. Consistent with the latter 
possibility, others have found that poly(Tyr-Glu) is a poor sub- 
strate for Rad53p (ref. 19; D. Stern, pers. comm.). We found that 
23 other kinases also efficiently use poly(Tyr-Glu) as a substrate, 
indicating that there are at least 27 kinases in yeast that are capa- 
ble of acting in vitro as tyrosine kinases. One of these, Riml Ip, 
was recently shown to phosphorylate a Tyr residue on its in vivo 
substrate, Imelp, indicating that it is a bona fide tyrosine 
kinase^^. Thus, our experiment roughly tripled the number of 
kinases capable of phosphorylating tyrosine, and has raised ques- 
tions about some of those classified as such kinases. 



Correlation between functional specificity and amino 
sequences of the poly(Tyr-Glu) kinases 

The large-scale analysis of yeast protein kinases allowed us to 
compare the functional relationship of the protein kinases with 
one another. We found that many of the kinases that phosphory- 
late poly (Tyr- Glu) are related to one another in their amino acid 
sequences: 70% of the poly(Tyr-Glu) kinases cluster into a dis- 
tinct four groups on a dendrogram in which the kinases are orga- 
nized relative to one another based on sequence similarity of 
their conserved protein kinase domains (Fig. 5a) . Further exami- 
nation of the amino acid sequence revealed four types of amino 
acids that are preferentially found in the poly(Tyr-Glu) class of 
kinases relative to the kinases that do not use poly(Tyr-Glu) as a 
substrate (three are lysines and one is a methionine) ; one residue 
(an asparagine) was preferentially located in the kinases that do 
not readily use poly(Tyr-Glu) as a substrate (Fig. 5b). Most of the 
residues lie near the catalytic portion of the molecule^** (Fig. 5b), 
suggesting that they may have a role in substrate recognition. 

Discussion 

Large-scale analysis of protein kinases. We used a novel protein 
chip technology to characterize the activities of 119 protein 
kinases for 17 different substrates. We found that particular pro- 
teins are preferred substrates for particular protein kinases and 
that, vice versa, many protein kinases prefer particular substrates. 
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Fig. 4 Quantitative analysis of protein kinase reactions. Kinase activities were determined using a phosphoimager. The kinase signals were then transformed into fold 
increases by normalizing the data against negative control, a. Signals of 1 19 kinases in 4 reactions were shown in log scale. The fold increases ranged from 1 to 1000- 
fold. The numbers on the axis indicate the particular kinase that was analysed (for reference numbers, see Table A, http7/genetics. nature.com/supplementaryJnfo/). 

b. To determine substrate specifici^, specificity index (SI) was calculated using the following formula: Slir-F^F^+F/^ +Fj^/rl where i represents the ID of a kinase 

used, r represents the ID of a substrate, and F^rrepresents the fold increase of a kinase / on subsuate r compared with GST alone. Several examples of kinase specificity 
are shovm when SI is greater than 3.The entire set of fold increase data can be retrieved from our web site (http://bioinfo.mbb.yale.edu/genome/yeast/chip). 
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Fig. 5 Phylogenetic tree derived from the kinase core 
domain multiple sequence alignment illustrating the cor- 
relation between functional specificity and amino 
sequences of the poly(Tyr-Glu) kinases, a, Kinases that can 
use poly(Tyr-Glu) as a substrate often map to specific 
regions on a sequence comparison dendrogram. The 
kinases that efficiently phosphorylate poly(Tyr-Glu) are 
indicated in green; two kinases that weakly use this sub- 
strate are in indicated in blue. Rad53p and Ste7p, which 
could not phosphorylate poly{Tyr-Glu), are indicated in yel- 
low. As shown, 70% of these kinases lie in four sequence 
groups (circled), b. Structure of the rabbit muscle phospho- 
rylase kinase^® (PHK). The positions of residues preferen- 
tially found in kinases that can use poly{Tyr-Glu) as a 
substrate are indicated in blue (dark blue indicates a basic 
residue; light blue indicates a methionine): the asparagine 
residue that is usually found in kinases that do not use 
poly(Tyr-Giu) is indicated in green. The conserved DFG 
region that is implicated in catalysis is indicated in red, 
whereas the conserved APE region of the substrate bind- 
ing domain is indicated in purple. 



One concern with these studies is that it is possi- 
ble that kinases other than the desired enzyme 
are contaminating our preparations. Although 
this cannot be rigorously ruled out, analysis of 
five of our samples by Coomassie staining and 
immunoblot staining with anti-GST antibodies 
does not reveal any detectable bands in our 
preparation that are not GST fusions. 

It is important to note that in vitro assays do 
not ensure that a substrate for a particular kinase 
in vitw is phosphorylated by the same kinase in 
vivo. Other factors might restrict kinase-sub- 
strate recognition in vivo such as the presence of 
additional regulatory factors and subcellular 
localization. Nevertheless, these experiments 
indicate that certain proteins are capable of serv- 
ing as substrates for specific kinases, thereby 
allowing further analysis. In this respect, these 
assays are analogous to two-hybrid studies in 
which candidate interactions are detected. Fur- 
ther experimentation is necessary to determine if 
the processes normally occur in vivo. 

Consistent with the idea that many of the sub- 
strates are likely to be bona fide substrates in vivo is the observation 
that three kinases, Hrr25p, Pbs2p and Meklp, phosphorylate their 
known substrates in our assays. Moreover, many of the kinases (for 
example, Ste20p) co-localize with their in viYn? substrates (for exam- 
ple, Axl2p). Thus, we expect many of the kinases that phosphorylate 
substrates in our in vitro assays are likely to also do so /n vivo. 

Although most of the kinases were active in our assays, several 
were not. Presumably, these latter kinase preparations either lack 
sufficient quantities of an activator or were not purified under acti- 
vating conditions. For example, Cdc28p, which was not active in 
our assays, might be lacking its activating cyclins. For the case of 
Hoglp, we treated cells with high salt to activate the enzyme. As 
nearly all of our kinase preparations showed activity, we presume 
that at least some of the enzyme in the preparation has been prop- 
erly activated and/or contains the necessary cofactors. It is likely 
that the overexpression of these enzymes in their native organism 
conU^ibutes to the high success of obtaining active enzymes. It is also 
possible that the use of GST fusions that are capable of dimerization 
might augment activation of some kinases through trans phospho- 
rylation. This is not the case for Hogl , which is not activated unless 
high salt is added to the meditim. 

Our assays identified many kinases that use poly(Tyr-Glu) as sub- 
strate. The large-scale analysis of many kinases allowed the novel 
approach of correlating functional specificity of poly(Tyr-Glu) 




m Basic residue 
m Methionine 
m Asparagine 
m DFG Sequence 
BARE Sequence 



kinases with specific amino acid sequences. Many of the residues of 
the kinases that phosphorylate poly (Tyr-Glu) contain basic residues. 
This might be expected if there were electrostatic interactions 
between the kinases residues and the Glu residues. The roles of some 
of the other residues, however, are not obviotis, such as the Met 
residues on the kinases that phosphorylate poly(Tyr-Glu) and the 
Asn on those that do not. These kinase residues may confer substrate 
specificity by other mechanisms. Regardless, analysis of additional 
substrates should allow a further correlation of functional specificity 
with protein kinase sequence for all protein kinases. 

Protein chip technology. In addition to the rapid analysis of large 
number of samples, the protein chip technology described here has 
substantial advantages over conventional methods. First, the chip- 
based assays have very high signal-to-noise ratios. We found that the 
signal-to-noise ratio exhibited using the microwell chips is much 
better (> 10-fold) than that observed for U^ditional microtitre dish 
assays (data not shown) . Presumably this is due to the fact that ^^Py- 
ATP does not bind the PDMS as much as microtitre dishes. Second, 
the amount of material needed is very small. Reaction volumes are 
1/20-1/40 the amount used in the 384-well microtitre dishes; less 
than 20 ng of protein kinase was used in each reaction. Third, the 
enzymatic assays using protein chips are extremely sensitive. Even 
though only 105 fusions were detectable by immunoblot analysis, 
1 1 2 had enzymatic activity greater than fivefold over background for 
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at least 1 substrate. For example, Mpslp consistently exhibited the 
strongest activity in many of the kinase assays, even though we have 
never been able to detect this fusion protein by immunoblot analysis 
(Figs If^and 3a) . Fourth, the chips are inexpensive; the material costs 
less than eight cents for each array. The microfabricated molds are 
also easy to make and inexpensive. 

In addition to the analysis of protein kinases, this protein chip 
technology is also applicable to a wide variety of additional assays, 
such as ATP and GTP binding assays, nuclease assays, helicase 
assays and protein-protein interaction assays. In an independent 
study, yeast proteins were expressed as GST fusions under the much 
weaker CUPl promotor^. Although the quality of these clones has 
not been established, biochemical activities were identified using 
pools of yeast strains containing the fusion proteins. The advantage 
of our protein chip approach is that all samples can be analysed in a 
single experiment. The fact that many protein kinases are active in 
the autophosphorylation assay indicates that at least some of the 
attached protein kinases retain enzymatic activity. 

We used microwells that have the advantage of reducing evapora- 
tion and segregating samples, which is particularly useful for solu- 
tion-based reactions. Flat EDMS chips and glass slides, however, 
o can also be used for different assays at high density (H.Z. and M.S., 

^ unpublished data); these have the advantage that they can be used 

3 with standard pinning tool microarrayers. This technology can also 

c be applied to facilitate high-throughput drug screening in which 

2 one can screen for compounds that inhibit or activate enzymatic 
cS activities of any gene products of interest. Because these assays will 
o be carried out at the protein level, the results will be more direct and 
^ meaningful to the molecular function of the protein. 

ts We configured the protein chip technology for a specific pro- 

• tein kinase assay using commonly available sample handling and 

y recording equipment. For this purpose, array dimensions 

"5 remained relatively large compared with dimensions readily 

^ available with micromolded silicone elastomer structures 

£ Thus, it should be possible to make micromolded protein chips 

^ with microwell densities increased by several orders of magni- 

3 tude and carry out high-throughput biochemical assays using 
2 arrays of 10,000 to 1,000,000 microwells using automatic sample 
o handling and measurement techniques. 

g We have developed an inexpensive, disposable protein chip tech- 

© nology for high- throughput screening of protein biochemical activ- 

§ ity. Its usefulness was demonstrated through the analysis of 119 

protein kinases from S. cerevisiae assayed for phosphorylation of 17 
different substrates. These protein chips permit the simultaneous 
measurement of hundreds of protein samples. The use of micro- 
molded microwell arrays as the basis of the chip technology allows 
array densities to be increased by several orders of magnitude. With 
the development of appropriate sample handling and measurement 
techniques, these protein chips may be adapted for the simultane- 
ous assay of several thousand to millions of samples. 

Methods 

Cell culture, constructs and protein purification. Using a published recombi- 
nation strategy*, we cloned 119 of 122 yeast protein kinase genes in a high- 
copy t/RA J expression vector (pEG(KG)) that produces GST fusion proteins 
under the control of the galactose-inducible GALl promoter^^. Briefly, 
primers complementary to the end of each ORF were purchased (Research 
Genetics). The ends of these primers contain a common 20-bp sequence. In a 
second round of PGR, we modified the ends of these products by adding 
sequences that are homologous to the vector The PGR products containing 
the vector sequences at their ends were transformed along with the vector into 
a pep4 yeast strain (which lacks several yeast proteases^ , and Ura* colonies 
were selected. Plasmids were rescued into E. coli, verified by restriction 
endonuclease digestion and the DNA sequence spanning the vector-insert 
junction was determined using a primer complementary to the vector. For the 
GST::Cla4 construct, a frameshift mutation was found in a poly(A) stretch in 



the ami no- terminal coding region. Three independent clones were required 
to find the correct one that maintained reading frame. For eight kinase genes 
we were unable to obtain a PGR product, presumably because the genes were 
large. For five of these genes two overlapping PGR products were obtained 
and introduced into yeast celb, Gonfirmed plasmids were reintroduced into 
the pep^ yeast strain for kinase protein purification. 

For preparing samples using the 96-wen format, we grew cells (0.75 ml) 
in medium containing raffinose to O.D. (600) -0.5 in boxes containing 2 ml 
wells; two wells were used for each strain. Galactose was added to a final 
concentration of 4% to induce protein expression, and the cells were incu- 
bated for 4 h. The cultures of the same strain were combined, washed once 
with 500 ^il lysis buffer, resuspended in 200 \i\ lysis buffer and transferred 
into a 96x0.5 ml plate (Dot Scientific) containing 100 ^il chilled glass beads. 
Gells were lysed in the box by repeated vortexing at 4 *G and the GST fusion 
proteins were purified from these strains using glutathione beads and stan- 
dard protocols'^ in a 96-well format. The purity of five purified 
GST::kinase proteins (Swel, Ptk2, Pkhl, Hogl, Pbs2) was determined by 
comparing the Coomassie staining patterns of the purified proteins with 
the patterns obtained by immunoblot analysis using anU-GST antibodies. 
The results indicated that the purified proteins are more than 90% pure. To 
purify the activated form of Hoglp, cells were challenged with NaCl (0.4 
M) in the last 5 min of the induction. Protein kinase activity was stable for 
at least 2 months at -70 "C with little or no loss of kinase activity. 

Chips fabrication and protein attachment. Ghips were made from the sil- 
icone elastomer PDMS (Dow Chemical) cast over micromachined molds. 
Liquid PDMS was poured over the molds and, after curing (at least 4 h at 
65 "G). flexible silicone elastomer array sheets were peeled from the 
reusable molds. Although PDMS may be readily cast over microlitho- 
graphically fabricated structures, for the purposes of the kinase assay 
described herein, molds made from sheets of acrylic patterned with a com- 
puter-controlled laser milling tool (Universal Laser Systems) sufficed. 

We tested over 30 different arrays. The variables tested were width and 
depth of the wells (widths ranging from 100 fim to 2.5 mm, depths from 100 
Jim to 1 mm), spacing between wells (100 p.m to 1 mm), configuration (either 
rectangular arrays or closest packed) and microwell shape (square versus 
round). The use of laser-milled acrylic molds offered a fast and inexpensive 
method to realize a large number of prototype molds of varying parameters. 

To determine the conditions that maximize protein attachment to the 
wells, we treated PDMS with H2SO4 (5 M), NaOH (10 M). hydrogen per- 
oxide or a crosslinker GPTS (Aldrich; ref 1 1). We have found that GPTS 
treatment resulted in the greatest absorption of protein to the microwells 
relative to untreated PDMS or PDMS treated other ways. Briefly, after 
washing with 100% ethanol three limes at RT, the chips were immersed in 
1% GPST solution (95% ethanol, 16 mM HOAc) with shaking for 1 h at 
RT. After 3 washes with 95% ethanol. the chips were cured at 135 "C for 2 h 
under vacuum. Cured chips can be stored in dry argon for months' ^ To 
attach proteins to the chips, protein solutions were added to the wells and 
incubated on ice for 1-2 h. After rinsing with cold HEPES buffer (10 mM 
HEPES, 100 mM NaCl, pH 7.0) three times, the wells were blocked with 
1% BSA in PBS (Sigma) on ice for >1 h. Because of the use of GPTS, any 
reagent containing primary amine groups was avoided. 

To determine the concentration of proteins that can be crosslinked to the 
treated PDMS. HRP anti-mouse Ig (Amersham) was attached to the chip using 
serial dilutions of the enzyme. After extensive washing with PBS, the bound 
antibodies were detected using an ECL kit (Amersham). We found that up to 
8x10"^ Hg/^m^ of protein can be attached to the surface; a minimum 8x10"'^ 
[ig/p-m^ is required for detection by our immunostaining methods^. 

Immunoblotting, kinase assay and data acquisition. GST::protein kinases 
were tested for in vitro kinase activity'^ using ^^Py-ATP In the autophos- 
phorylation assay, the GSTkinases were directly adhered to GPTS-treated 
PDMS and the in vitro reactions carried out with ^Py-ATP in appropriate 
buffer. In the substrate reactions, the substrate was adhered to the wells, 
and the wells were washed with HEPES buffer and blocked with 1% BSA 
before kinase, ^Py-ATP and buffer were added. The total reaction volume 
was kept below 0.5 nl per reaction. After incubation for 30 min at 30 'G, the 
chips were washed extensively, and exposed to both X-ray film and a Mole- 
cular Dynamics phosphoimager, which has a resolution of 50 \im and is 
quantitative. For 12 substrates each kinase assay was repeated at least twice; 
for the remaining 5 the assays were performed once. 
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Kinase sequence alignments and phylogenetic trees. Multiple sequence 
alignments based on the core kinase catalytic domain subsequences of the 
107 protein kinases were generated with the CLUSTAL W algorithm^, 
using the Gonnet 250 scoring matrix^^. Kinase catalytic domain sequences 
were obtained from the SWISS-PROT (ref. 35). PIR (ref. 36) and GenBank 
(ref. 37) databases. For those kinases whose catalytic domains are not yet 
annotated (DBF4/YDR052C and SLN1/YIL147C). probable kinase subse- 
quences were inferred from alignments with other kinase subsequences in 
the data set with the PASTA algorithm^^-^^ using the BLOSUM 50 scoring 
matrix^''. Protein subsequences corresponding to the 1 1 core catalytic sub- 
domains^' were extracted from the alignments, and the phylogenetic trees 
were computed with the PROTPARS (ref. 42) program (Fig. 5a). 

Functional grouping of protein chip data. To visualize the approximate 
functional relationships between protein kinases relative to the experimen- 
tal data, kinases were hierarchically ordered based on their ability to phos- 
phorylate the 12 different substrates (data available on web site). A profile 
corresponding to the positive or negative activity of the 107 protein kinases 
to each of the substrates was recorded, with discretized values in [0,1]. 
Matrices were derived from the pairwise Hamming distances between 



experimental profiles, and unrooted phylogenies were computed using the 
Fitch-Margoliash least-squares estimation method^^ as implemented in the 
FITCH program^^ of the PHYLIP software package^^ ggch case, the 
input order of taxa was randomized to negate any inherent bias in the orga- 
nization of the data set, and optimal hierarchies were obtained through 
global rearrangements of the tree structures. 
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Abstract 

Despite the complexity of the subject of protein-alum interactions, a valuable information can be obtained by analyzing 
the adsorbed and desorbed protein by common physico-chemical methods. In the present work, to approach the adsoiption 
of hepatitis B surfece antigen (HBsAg) on alum, the experimental data were supported by complementary analyses of the 
adsorbed protein by immunoelectron microscopy and tihe desorbed protein by denaturing size-exclusion chromatography and 
sodium dodecyl sulfate-polyacrylamide gel electrophoresis under reducing conditions. First, the depletion of HBsAg was 
investigated. The aspects assessed were the conditions, recovery and diromatographic performance of the desorbed protein. 
The resulfc obtained, strongly suggested the loss of particulate structure of HBsAg after adsorption on alum. This conclusion 
was further reinforced by direct immunoelectron microscopic visualization of HBsAg in the adsorbed state. © 1998 
Publisheii by Elsevier Science B.V. All rights reserved. 

Keywords: Hepatitis B surface antigen; Aluminium hydroxide 
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1. Introductioa 

According to World Health Organization esti- 
mates, by the year 2000, there will be 400 million 
hepatitis B virus carriers in the world if hepatitis B 
vaccine is not widely used [1]. Several yeast-derived 
hepatitis B vaccines are commercially available now 
based on the same recombinant hepatitis B surface 
antigen (HBsAg) and aluminium^ hydroxide (alum) 
adjuvant. In . numerous clinical trials, these prepara- 
tions have demonstrated an immunogenicity and 
efficacy similar to tiiat of plasma-derived antigen, 
with no difference in antibody specificity and avidity 
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[2-4]. However, using both plasma- and yeast-de- 
rived vaccines, a small portion (about 5-10%) of 
vaccinated subjects fail to produce detectable anti- 
body response. Induction of immune reactivity de- 
pends upon antigen reaching aind bekig available in 
lymphoid organs in a dose- and time-dependent 
manner [5]. Since HBsAg is adnainistered as the 
alujn-adsorbed preparation, antigen-alum interac- 
tions should be . crucial in the immune response. This 
was experiinentally shown for gpl20 protein [6]. As 
most proteins, gpl20 is quantitatively adsorbed on 
alum in a few minutes. However, tiie gpl20-alum 
interaction is weak and sensitive to anions from 
physiological fluids. Due to this, tiie protein is 
rapidly desorbed from alum just after injection that 

All rights reserved. 
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explains a weak immune response to the MN gpl20 
mv-l vaccine. 

Our understanding of antigen-alum interactions 
may provide a way to understand the non-respon- 
siveness, of some people, to hepatitis B vaccination. 
The analysis of antigen-alum interactions requires 
knowledge of both the kinetic events that occur 
during the adsorption event and the structure of 
adsorbed protein. However, neither of these aspects 
has been explored for altmi-adsorbed proteins due to 
the lack of reliable experimental techniques capable 
to support kinetic measurements as well as to assess 
the adsorbed protein at a molecular level [Review, 
7]. Despite the complexity of alum structure [8] and 
great structural variety among antigens, protein 
adsorption on alum is commonly approached by 
Langmuir model originally developed for small 
molecules adsorbing at an 'ideal' surface. This 
model caiinot describe the adsorption behavior of 
several antigen proteins [9], including HBsAg [un- 
pubUshed results]. However, without the stimulus of 
usable data, no superior theoretical treatment has 
been developed. 

Despite the coinplexity of the subject of protein- 
alum interactions,, a valuable information can be 
obtained by analyzing the adsorbed and desorbed 
protein by common physico-chemical methods. In 
the present work, to approach the adsorption of 
hepatitis B surface antigen (HBsAg) on alum, the 
experimental data were supported by complementary 
analyses of the adsorbed protein by immunoelectron 
microscopy and the desorbed protein by denaturiag 
size exclusion chromatography (SEC) and sodium 
dodecyl sulfate-polyacrylandde gel electrophoresis 
(SDS-PAGE). The results obtained evidence the loss 
of particulate structure of HBsAg after adsorption on 
alum. The same has been previously suggested from 
the immunological studies of KDBsAg [10] and 
another virus-like particles [11,12]. Hence, our work 
provides a physico-chemical support to this hypoth- 
esis. 



2. Experimental 

2.1, Materials 

Tris(hydroxymethyl)aminomethane, dithiotreitol 
(DTT), SDS, mercaptoethanol, sodium chloride. 



sodium phosphates and another mentioned reagents 
were analytical grade and obtained from Merck 
(Dmnstadt, Germany). The reagents used in electron 
microscopy were from Agar Scientific (Essex, UK). 
AU solutions were made in Milli-Q grade water. 
Aluminium hydroxide gel (alum) was purchased as a 
sterile 2% (w/v) Al(OH)3 suspension from Superfos 
Biosector (Vedbaek, Denmark);. Hyflo Super Cel 
(celite) was from Fluka (Buchs, Switzerland). Phos- 
phate-buffered saline (PBS) contained 1.7 mM 
KH2PO4, 7.9 mM NajHP04. 2.7 mM KCl and 250 
mM NaCl, pH 7!0. Recombinant HBsAg, cloned and 
expressed in yeast Pichia pastoris, was purified by a 
multi-step procedure [13] and provided as a solution 
in PBS (1.61 mg/ml) from the National Center for 
Bioproducts (Havana, C!uba). This stock solution was 
used in the preparation of working standard solutions 
of lower concentrations as well as in the adsorption 
studies. / 

Anti-HBsAg mouse monoclonal antibody (CB 
Hepl) and protein A-colloidal gold complex (par- 
ticle diameter, 15 nm) used fox the immunodetection 
of HBsAg were provided by the Division of Im- 
munotechnology and Diagnostics of the Center for 
Genetic Engineering and Biotechnology (Havana, 
Cuba). - 

The , HBsAg-celite preparation (100 [xg/ml 
HBsAg) used as a reference standard in the electron 
microscopic study of HBsAg-alum preparation was 
prepared as described in Ref. [14]. 

2.2. Apparatus 

2.2. L Size-exclusion chromatography (SEC) 

The SEC system included a Pharmacia LKB 2248 
pump, Knauer degasser, Pharmacia 2141 variable- 
wavelength UV detector operated at 280 nm and 
Pharmacia 2221 programmable integrator. The col- 
umn used was a TSK G4000 SW (600X7.5 mm I. 
D.) purchased from Tosohaas (Stuttgart, Germany). 
Elution ^yas with 0.1 M Tris-HGl in 0.3% SDS, pH 
8.0 at 0.9 ml/min. After injecting the working 
standard solutions (100 pJ)„ the conversion of peak 
areas to protein concentrations ,was carried out using 
progranunable integration. 

2.Z2. SDS-PAGE 

Electrophoresis (Hoefer Scientific Instrmnent) was 
performed as described by Laemmli [15] on 12.5% 
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gels at 30 mA for 3.5 h at room temperature under 
reducing conditions. The gels were stained by 
Coomasie blue dye (Bio-Rad, Richmond, CA, USA) 
or silver nitrate [16]., The Coomasie-stained gel was 
scanned by laser densitometry using Ultroscan XL 
(Pharmacia). For immunoblotting, the gel was incu- 
bated with CB Hepl antibody and developed, with 
protein A conjugated to aminobenzidine [17]. 

2,3. Preparation of HBsAg-alum 

A mixture of 2% aluminium hydroxide gel (2 ml) 
and HBsAg stock solution (1.7 raJ) was gently 
agitated for 30 min at room temperature^ and then 
diluted with PBS in a 25-ml volimietric flask. The 
concentration of adsorbed HBsAg was 100 fxgVml, 
as detennined by Lowry method [18] from protein 
balance. The placebo was prepared by the dilution of 
2% aluminium hydroxide gel (2 ml) xvith PBS up to 
25 ml. 

2A, Reduction of HBsAg particles 

Aliquots (200 |xl) from working standard solutions 
(0.2-1.6 mg/ml HBsAg) were incubated with DTT/ 
M sample buffer (40 for 10 min at 100°C [The 
DTT/M sample buffer contained 417 mM DTT, 
4.2% (w/v) SDS and 16% 2-mercaptoethanol]. The 
reduced samples were analyzed by SDS-PAGE/ 
Coomasie blue staining (30 ixl) and SEC (100 |xl) as 
described. 

2.5. Desorption of HBsAg from alum 

One ml of the HBsAg-alum preparation (100 
jig/riQl) was cehtrifuged for 10 min at 2500 rpm. The 
pellet separated was incubated for 3 min at lOO^C 
with a mixture of 0.4 M Na/P04, pH 8.0^ (100 
jU)-DTT/M sample buffer (20 (5:1). After 
centrifugation for 5 min at 10 000 rpm, the superna- 
tant was analyzed by SDS-PAGE/Coomasie blue 
staining (30 |xl).and SEC (100 ixl). 

In order to determine the HBsAg recovery after 
desorption, the area of SEC-peak from the desorbed 
sample was extrapolated to the calibration curve 
previously generated by injections of the HBsAg 
working standard solutions. The determinations were 
carried out in duplicate. 

In another experiment (Fig. 5), the HBsAg was 



desorbed from alum using the described procedure, 
except the incubations at 100*C were for 10, 15, 20 
and 25 min, respectively. After that, the samples 
were analyzed by SEC (lOO.jjul) and SDS-PAGE/ 
silver staining (1.7 jjig HBsAg per spot). 

2.6. Transmission electron microscopy (TEM) 

Two drops of HBsAg solution iu PBS (0.1 mg/ 
ml) were placed for 5 min onto a 400 mesh copper 
grid coated with formvar-carb.on fdm. Excess sample 
was blotted off. Grids were stained with uranyl 
acetate and examined in a Jeol-JEM 2000EX trans- 
mission electron microscope, acceleration voltage 80 
kV and magnification 40 OOOX. 

The adsorbed HBsAg was analyzed by sectioning 
the HBsAg-alum and HBsAg-celite pellets obtained 
by centrifugation for 10 niun at 2500 rpm of the 
respective preparations (10 ml). Placebo was used as 
a blank. The pellets were fixed by immersion in 1% 
glutaraldehyde, rinsed with PBS and dehydrated in 
increasing (30-100%) ethanol concentrations. The 
embedding was in Araldite. Each resulting block was 
sectioned (A^=60) with an ultranoicrotome LKB 2188 
(NOVA) and 400-A sections >vere mounted on 400 
mesh nickel grids without membrane. Staining was 
with \iranyl acetate and lead citrate followed by 
examination as described 

2.7. Immunoelectron microscopy . 

The grids coated with soluble HBsAg were floated 
on six drops of gold buffer (1% BSA in PBS) before 
transfer to a drop of the PBS-diluted CB Hepl 
antibody (dilution, 1:20) for incubation at room 
temperature (30 min). After washing in gold buffer 
to remove unbound antibody molecules, the grids 
were floated on two drops of gold buffer-diluted 
protein A-gold complexes (dilution, 1:200) for 40 
min at room temperature. Finally, the grids were 
subsequently washed in six drops of gold buffer and 
two drops of distiQed water. After staining with 1% 
uranyl acetate, the grids were examined as described. 

Similarly, the sections (N=12) from the placebo, 
HBsAg-alum and HBsAg-celite peUets were incu- 
bated for 5 min on a drop of PBS containing 1% 
ovalbumin before transfer to a drop of PBS-diluted 
CB Hepl monoclonal antibody (dilution 1:20). Incu- 
bation was for 60 min at room teniperature. The 
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grids were then rinsed for 5 min with PBS to remove 
unbound antibody molecules and then incubated on a 
drop of protein A-colloidal gold complex, pH 7.2, 
for 40 min at room temperature. Finally, the sections 
were stained and examined as described. 



3. Results and discussion, 

3.1, Analysis of reduced HBsAg 

Recombinant HBsAg is produced by the expres- 
sion of a 226-amino acid polypeptide in yeast cells 
where approximately 100 of these polypeptides arie 
assembled intracellularly into 22 nm lipoprotein 
particles [19]. After purification, the assembled 
HBsAg particles are detected by SEC, electron 
microscopy and enzyme-linked immunosorbent assay 
(ELISA) using polyclonal antibodies [20]. To assess 
the HBsAg monomer, the particles should be previ- 
ously reduced. Efficient reduction of HBsAg has 
been achieved by using a mixture of DTT and 
mercaptoethanol (DTT/M), iastead of one of them 
alone [21.22], suggesting that tiiese reducing agents 
do not possess the same reducing power on multiple 
disulfide bonds within HBsAg particles. Soluble 
HBsAg-SDS complexes formed after reduction are 
suitable for further analysis by reversed-phase high- 
performance liquid chromatography (RP-HPLC) [21] 
or denaturing SEC [22]. 

In denaturing SEC (Fig. 1). the reduced HBsAg is 
resolved into the three peaks: peak 1 corresponds to 
the co-elution of non-reduced HBsAg and no^-pro- 
tein micellar aggregates, peak 2 to the HBsAg 
dimers and monomers and peak 3 to lower-molecu- 
lar-mass non-protein compounds. Earlier studies 
have shown that peak 1 remaining after DTT/M 
reduction is produced by non-protein aggregates, 
probably, by hpid-SDS complexes [22], 

The correlation between the areas of peak 2 and 
the HBsAg concentrations taken for reduction was 
linear in the range 0.2-1.6 mg/ml HBsAg (r= 
0.9991). The reproducibility was tested with three 
replicate injections of the reduced stock solution of 
HBsA§ on three different days. The relative standard 
deviations (R.S.D.s)-were 1.16-1.36% (within-days) 
and 1.25-1.65% (between-days). 

The presence of shoulders before and after the 
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Fig. 1. Chrdmatogram of reduced HBsAg. Conditions: TSK 
iG4000 SW (600X7.5 mm ID.); eluent, 0.1 Af Tris-HQ con- 
taining 0.3% SDS. pH 8.0; flow rate. 0.9 ml/min; detection, XA^ at 
280 nm; injection volume, 100 pi; sample buffer, 417 mM DTT, 
4.2% (w/v) SDS and 16% (v/v) 2-mercaptoethanol. Peaks: 1- 
Upid-SDS aggregates, 2=reduced HBsAg, 3=low-molecular- 
mass non-protein compounds. 

maximum of peak 2 (Fig. 1) was a hint at the 
heterogeneity of HBsAg structures . formed after 
reduction. Indeed, as shown by SDS-PAGE/ silver 
staining of SEC-firactions j&om peak 2 (Fig. 2), the 
reduced HBsAg was represented by dimers eluting 
presumably in the forward shoulder, monomers 
eluting in the maximum of peak 2 and lower-molecu- 
lar-mass proteins elutiiig in the backward shoulder. 
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Fig. 2. SDS-PAGE/sUver staining of DTT/M-reduced HBsAg 
(lane 1) subjected to SEC ftactionation (lanes 2-4): forward 
shoulder of peak 2 (lane 2), maximum of peak 2 (lane 3) and 
backward shoulder of peak 2 Gane 4). Experimental conditions ias 
in text. The arrow indicates the Af, 24 000 HBsAg monomer. 
Amount.^/ 20 (lane 1) and 1.7 (lanes 2-4) |xg HBsAg. 

The last ones migrated before the 24 000 mono- 
mer band on SDS-gel (Fig. 2, lane 4) and were 
recognized in immunoblotting (data not shown). 
When HBsAg monomer from the maximum of peak 
2 (Fig. 2, lane 3) was repeatedly reduced, no 
appearance of lower bands was detected evidencing 
that the observed lower bands are not generated by 
the reducing procedure. We assumed these bands as 
degradation products. Similar degradation bands 
were detected by immunoblotting of HBsAg ex- 
pressed in 5. cereviside or H. polymorpha [23]. 
Since these bands were'foimd not only in purified 
material but also in crude HBsAg-containing yeast 
extract [23] i the detected degradation probably takes 
place in vivo. The degradation products are particle- 
associated, because we could not separate them firom 
assembled particles by SEC under non-denaturing 
conditions [data not shown]. Since HBsAg particles 



are resistant to proteases [24], the degradation proba- 
bly implies covalent modifications of labile amino 
acids exposed on particle surface (Ser. Thr p-elimi- 
nation and racemization, Asn deamidation, Cys, Trp, 
Tyr oxidation or the hydrolysis of peptide bonds 
[25]). 

3.2. Analysis of desorbed HBsAg 

To improve the immunogenicity of purified 
HBsAg particles, these are adsorbed onto aluminium 
hydroxide gel in hepatitis B vaccine. Like tetanus 
toxoid and diphtheria toxoid [9], HBsAg is adsorbed 
on alum independentiy on pH and excess phosphate 
ions, and this adsorption is irreversible under non- 
denaturing conditions [unpublished resxilts]. In the 
present work, adsorbed HBsAg was recovered after 
tiie reduction of HBsAg-aliun pellet with a mixture 
of 0.4 M Na/P04, pH 8.0-DTT/M sample buffer 
(Fig. 3). The desorbed protein. migrated presumably 
as a 24 000 monomer onto SDS-gel. When the 
known amounts of HBsAg were reduced and applied 
on gel, the intensities of the . 24 000-bands 
detected by laser densitometry were linearly related 
to the HBsAg amounts (r=^0.995). By extrapolating 
the intensity of the band firom the desorbed sample, 
the HBsAg recovery was estimated to be 43 ±4%. 
This value may be overestimated due to a significant 
enlargement of the band firom desorbed HBsAg 
compared to those firom HBsAg solutions (Fig. 3). 
As an alternative, the desorbed HBsAg was de- 
termined by SEC; The chromatogram of desorbed 
HBsAg was quite similar to tiaat of HBsAg reduced 
in solution (Fig. 4). By plotting the area of peak 2 
firom the desorbed sample on the calibration curve, 
the HBsAg recovery was calculated to be 50±1%. 
Hence, both SDS-PAGE and SEC indicate that a 
large portion of adsorbed HBsAg cannot be re- 
covered under reducing conditions. 

In respect to the recoverable fraction of HBsAg, it 
was continuously degraded at increasing the rates of 
reduction in the range from 3 to 25 min. In the 
chromatogram of desorbed HBsAg, the height of 
peak 2 was gradually diminished, whereas the back- 
ward shoulder corresponding to the elution of de- 
graded polypeptides inoreased. After 20 min of 
boiling, the desorbed HBsAg was presented mainly 
as the protein fragments. Under the same conditions, 
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Fig. 3. SDS-PAGE/Coomasie blue staining of HBsAg standard 
solutions (lanes 1-4). and alum-desorbed sample. Expeximental 
conditions as in Section 2. The airow indicates the Af, 24 000 
HBsAg monomer. Amount, 15 (lane 1), 20 (lane 2), 25 (lane 3) 
and 30 (lane 4) \l% HBsAg. Dcsorbed sample: the value expected 
assuming the complete recovery of HBsAg. 20 jxg; the value 
calculated from the calibration, curve, 9 jLg. 

the HBsAg monomer from intact particles was stable 
and no changes in its chromatographic profile (Fig. 
1) were observed even after 30 min of boiling with 
DTT/M sample buffer. The degradation detected by 
SEC for the recoverable fraction of adsorbed HBsAg 
was almost undetectable by SDS-PAGE, where 
larger rates of boiling led to a weak increase in the 
intensity of degradation bands (Fig. 5, right). This is 
probably due to a highly intensive, selective staining 
of HBsAg bands compared to that of degradation 
bands. 

Li conclusion, the present study of HBsAg desorp- 
tion raised two important observations. First, only 
half protein adsorbed is recovered under reducing 
conditions. Since the risducing conditions are capable 
to efficiently disiiipt any type of interactions having 
these an ionic, hydrophobic and/or Hgand-exchange 
nature, the non-recoverable fraction of HBsAg 




0 30 • d' 30 time, min 



Fig. 4. Chromatogram of HBsAg- reduced in solution (A) and 
desorbed from alum under reducing conditions (B). Conditions as 
in Fig. 1; concentration, 0.5 mg/ml HBsAg (A); injection volume, 
100 \d. Desorbed HBsAg: the value expected, assuming complete 
HBsAg recovery, 0.5 mg/ml HBsAg; the value calculated from 
the calibration curve, 0.24 mg/ml HBsAg. 



should be trapped to alum by an additional, besides 
adsorption, mechanism inaldng protein molecules 
inaccessible to reducing buffer. If the particulate 
structure of HBsAg were to be preserved in the 
adsorbed state, the HBsAg monomers would be 
quantitatively recovered. Second,, the fraction of 
HBsAg recoverable under reducing conditions is 
prone . to temperature-induced degradation, unlike 
intact HBsAg. Since the stability of a protein is 
determined by its three-dimensional structure [25], 
the results indicate that the structure of HBsAg 
particles is altered by alum adjuvant. 

The increasing attention to particulate polymeric 
antigens in the form of virus-like/particles is related 
to their ability to induce specific, cell-mediated 
inmiune response in the absence of adjuvants [Re- 
views, .26, 27]. Cytotoxic T lymphocytes (CTLs) 
provide a critical arm of the inMnune systefn in 
eliminating autologous cells expressing foreign an- 
tigen [28-30]. Although the mechanisms by which 
these approaches lead to the induction of CTLs are 
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Fig. 5. Chromatographic profile (left) and SDS-PAGE/silver staining pattern (right) of desorbed HBsAg after 10 (A). 15 (B), 20 (C) and 25 
(D) min of boiling the HBsAg-alum peUet with DTT/M buffer. Chromatographic conditions as in Fig. 1; injection volume, 100 pi. 
Electrophoretic conditions as in Section 2; amount, 1.7 ^Lg HBsAg. 



unknown, it has been suggested that the particulate 
nature of virus-like particles favors their optimal 
delivery to the class I antigen presentation pathway 
[11,12,31]. Unexpectedly, particiilate antigens, in- 
cluding HBsAg, failed to stimulate CTLs after their 
adsorption on alum eUciting, in contrast, substantial 
antibody titers [10, 11]. Hence, a supposition has 
been made tibiat the particulate structure of virus-like 
particles may be compronoised by the adsorption on 
adjuvant [31]. Our results from SEC and SDS-PAGE 
support this feeUng. In order to provide a direct 
evidence for the loss of particulate structure of 



HBsAg on alum, we analyzed the HBsAg-alum 
preparation by immunpelectroii microscopy. 

3.3. Immunoelectron visualization of adsorbed 
HBsAg 

The intact HBsAg was seen in electron micro- 
scope as 22-nm spheres (Fig. 6A). After its ad- 
sorption on alum, no particles were found in the 
HBsAg-alum sections suggesting alterations in the 
HBsAg morphology (Fig. 6B). In a few sections, a 
structures were found resembling HBsAg particles. 
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Fig. 6. Electron micrographs of inUct HBsAg (A). KBsAg-alum sections (B. C and D), placebo (E) and HBsAg-alum (F) sections. 
Experimental conditions as in Section 2. Scale bar is 200 nm. \ - . . 



but with a defect in particulation, suggesting particle 
damage (Fig. 6C and D). These structures were 
absent in tiie placebo sections (Fig. 6E). In order to 
prove that.HBsAg sdnicture is unaffected under the 
conditions of analysis, the HBsAg was adsorbed on 
celite and the HBsAg-celite sections were prepared 



as described. The HBsAg adsorption on celite is 
reversible [14], thus, the HBsAg particles should be 
found in the HBsAg-celite sections. As expected, 
the HBsAg particles retained in the pores of celite 
were clearly visualized (Fig. 6F). The adsorbed 
HBsAg, undetectable by direct visuaUzation, was 
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identified here by immunoelectron microscopy 
(lEM) usiiig CB Hepl monoclonal antibody. One of 
the most interesting aspects of this approach is its 
potential to detect HBsAg independently on the state 
of folding. . When the HBsAg-alnm sections, were 
incubated with CB-Hep 1 antibody followed by 
successive labeling with protein A-gold complex, 
these \yere specifically labeled proving, thus the 
presence of HBsAg on alum gel (Fig. 7A and B). 
The HBsAg appeared as in a dark-staining matrix 
structure of A1(0H)3 (Fig. 7A) as in the gel pores 
(Fig. 7B). It is essential that non-specific; a:dsorption 
of gold on alum was TniniTnal in the absence of 
HBsAg (Fig. 7C). 

In lEM of intact HBsAg, the mapping of 15-nm 
gold onto 22-nm HBsAg particles produces a label- 
ing pattern characterized by the presence of hoops 
around gold particles (Fig. 7D). This pattern was not 
observed after immunolabeling of HBsAg-alum 
sections (Fig. 7A and B) proving once more the loss 



of particulate structure of HBsAg after adsorption on 
alum. 

3A, Proposed model 

The initial adsorption event is a function of the 
interfacial activity of various groups on a particle 
surface and on the adsorbent [32]. The active sites on 
alum are ascribed to aluminium ions (Lewis acids) or 
hydroxyls [7], whereas those on HBsAg particle are 
represented by water-exposed phosphoUpids and 
hydrophilic protein domains [33]. The HBsAg-alum 
interaction is of non-ionic character, because it is 
insensitive to the conditions of pH and ionic 
strength. Despite ionic interaction, proteins are ad- 
sorbed on alum .by hydrophobic [34] and ligand- 
exchange [35,36] forces. The last mechanism com- 
prises the chemisorptive binding of phosphate and 
carboxylate groups of a protein to alum surface in 
accord with the Lewis acid-base model. Taking into 




Fig 7 Protein A-gold election nucroscopiclocaH^^^ 

(C). Immunolabeling pattern of intact HBsAg (D), Experimental conditions as in Section 2. Scale bar is 200 mn. 
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account that phosphate is one of the strongest Lewis 
bases, the interaction of phosphate heads from 
phospholipids of .HBsAg with unsaturated aluminium 
ions is highly expected. Phospholipids are the pre- 
dominant Hpid structures within HBsAg particles 
[37] and play a crucial role in the particle stabiliza- 
tion [38]. The lipid-protein interactions are respon- 
sible for the formation of the proper heUcal structure 
of the HBsAg protein, which disposes the remainder 
of the protein on the surface or interior of the 
particle. The lipid-protein interactions stabilize the 
conformatioh of the exterior hydrophiUc regions 
which contain the HBsAg antigenic sites [38]. 
Hence, the adsorption of phospholipids on alum is 
expected to produce a damage of integrity of outer 
hpid monolayer followed by rearrangements in hy- 
drophobic protein domains and lipid core inside the 
HBsAg particle. According to this modSl, the ob- 
served low recovery of HBsAg under reducing 
conditions can be explaiiied by a trap of the non- 
recoverable protein monomers between alum and 
disordered Upids from hpid core. On the other hand, 
the alterations in Upid-prot^ interactions within 
HBsAg particle destabilize a-heUces by the exposure 
of protein domains previously buried in lipid bilayer 
[38]. Being less stable, the. conformationally altered 
HBsAg monomers are more sensitive to degradation 
that explains the observed temperature-induced deg- 
radation of the recoverable HBsAg monomers. 
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Abstract: Dense, ultrathin networks of isocyanate terminated star-shaped poly{ethylene oxide) (PEG) 
molecules, cross-linked at their chain ends via urea groups, were shown to be extremely resistant to 
unspecific adsorption of proteins while at the same time suitable for easy biocompatible modification. 
Application by spin coating offers a simple procedure for the preparation of minimally interacting surfaces 
that are functionalized by suitable linker groups to immobilize proteins in their native conformations. These 
coatings form a versatile basis for biofu notional and biomimetic surfaces. We have demonstrated their 
advantageous properties by using single-molecule fluorescence microscopy to study immobilized proteins 
under destabilizing conditions. Biotinylated ribonuclease H (RNase H) was labeled with- a fluorescence 
resonance energy transfer (FRET) pair of fluorescent dyes and attached to the surface by a biotin-streptavidin 
linkage. FRET analysis demonstrated completely reversible denaturation/renaturation behavior upon 
exposure of the surface-immobilized proteins to 6 M guanidinium chloride (GdmCl) followed by washing in 
buffer. A comparison with bovine serum albumin (BSA) coated surfaces and linear PEG brush surfaces 
yielded superior performance in terms of chemical stability, inertness and noninteracting nature of the star- 
polymer derived films. 



Introduction 

Chemically designed surface coatings that can prevent 
unspecific protein adsorption are essential for various biotech- 
nological applications. Besides impeding biofouling, e.g., in 
membrane applications, nonadherent surface properties present 
a key condition for single molecule studies with immobilized 
proteins, for protein microarrays, and cell assays. To specifically 
bind proteins or to interact in a biomimetic way with living 
cells, surfaces have to be modified toward specific biological 
recognition,*"^ whereas at the same time avoiding uncontrolled 
adsorption that could lead to denaturation of proteins or 
unwanted activation of biological processes. While protein 
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immobilization itself is easy to accomplish, it is of utmost 
importance in studies of protein folding and function that the 
interaction between the protein under study and the surface 
environment is nodniiiiized. Otherwise, the results may not reflect 
intrinsic properties of the protein, but rather artifacts due to 
surface interactions. A number of surface preparations have been 
developed for the purpose of protein resistance, such as self- 
assembled monolayers on gold,^"'' glass,^ sihcon,^ titanium and 
titanium oxide, ^^"^^ polyelectrolyte multilayer films'*^* and 
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hydrogels.*^ Poly(ethylene oxide) (PEO) brushes have been 
especially recognized as biocompatible and resistant to protein 
adsorption^'^*^'^*^"^^ due to the hydrophilic but uncharged nature 
of the polymer. Still, these methods suffer either from insuf- 
ficient protein repellence, preparation methods that are tedious 
and difficult to reproduce, or low surface functionality. Here, 
we focus on a versatile, easily applicable functional surface 
coating for immobilization of proteins, prepared with a dense, 
ultrathin network formed from star shaped PEO molecules, 
linked at their chain ends via urea groups. We address the 
question as to which extent destabilization of the natural folded 
conformation can be avoided by the proper choice of surface 
linkage. Denaturation and renaturation of a protein, RNase H, 
specifically linked via biotin/streptavidin to ultrathin, cross- 
l^ed star polymer layers was studied by attaching a donor- 
acceptor pak of fluorescent dye molecules at specific locations 
along the polypeptide chain so that the two dyes are in close 
proximity in the folded structure and further apart in the ^ 
unfolded chain. The strong distance dependence of fluorescence 
resonance energy transfer (FRET) enables direct insights into 
the folding dynamics of the polypeptide chain. ^^"^ Our cross- 
linked star PEO surfaces showed superior performance com- 
pared with surfaces coated with linear PEO chains or pre- 
physisorbed proteins. 

Single-Molecule Studies of Protein Folding 

To be biologically active, the nascent polypeptide chain folds 
into a specific three-dimensional structure after biosyndiesis. 
Due to the many possible internal degrees of freedom of the 
polymer, a large number of macroscopic pathways exist that 
connect the vast number of unfolded conformations with the - 
much smaller ensemble of native, folded conformations.^ 
Therefore, protein folding is an inherendy heterogeneous 
process. In recent years, the concept of a 'funnel-shaped 
conformational energy landscape has become prevalent, in which 
the folding polypeptide chain is guided toward the thermody- 
namic free energy minimum, encountering barriers and expe- 
riencing a gradual loss of enthalpy and entropy.^"* Single- 
molecule studies can provide direct experimental evidence on 
the folding/unfolding pathways in the complex energy landscape 
of proteins. In recent years, FRET analysis has been applied in 
two-color confocal fluorescence studies on proteins diffusing 
freely in solution.2i'22.25 this method, the observation time is 
limited to the time it takes for a molecule to .diffuse through 
the detection volume, which is on the order of milliseconds. 
To gain information about slower processes, for example in 
studies of folding intermediates, it is necessary to inamobilize 
the proteins. Various iirunobihzation strategies have been ' 
reported for single-molecule spectroscopy, including trapping 
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in porous polymer matrixes,^'^'' unspecific adsorption to 
surfaces^^*^^ as well as specific adsorption via complex coor- 
dination of His-tagged proteins,^**'^^ by using biotin/ 
(strept-)avidin coupling^^^^ or charge interactions.^'^'^^ An im- 
mobilization technique, yielding minimal interaction with the 
environment, has been shown to be the trapping of proteins in 
surface-bound vesicles of '^100 nm diameter, in which they 
can freely diffuse within a limited volume.^'^'^ In this approach, 
however, it is not simple to control the solution conditions inside 
the vesicles in situ. The star polymer layers introduced here 
offer an easy and more versatile alternative since they are 
prepared by spin coating a solution of isocyanate terminated, 
six-arm star polymers from aqueous THF onto amino-fimction- 
alized substrates. They were examined for unspecific adsorption 
of RNase H and were found to be essentially nonadsorbing. In 
contrast to adsorbed biotinylated BSA films on hydrophilic glass 
and PEO brushes made with a small fraction (1%) of biotin- 
ylated PEO chains, unfolding/refolding of RNase H was 
coriipletely reversible on the star polymer derived surfaces, and 
offered high chemical stability. 



Experimental Section 

f 

Synthesis atnd Labeling of RNase H. Plasmid pJAL135C con- 
taining the gene of single cysteine mutant . of RNase H was a generous 
gift from Prof S. Kanaya (Osaka University, Osaka, Japan). The 
protein was overproduced in Escherichia coli HBlOl and purified as 
described.^^'^' The RNase H molecules were subsequendy labeled 
with Alexa Fluor 546-NHS (Molecular Probes) and biotin-NHS 
(Sigma-Aldrich St. Louis, MO). For FRETr measurements, a mutant 
of RNase H was constmcted that had cysteine residues at posi- 
tions 3 and 135, the thiol side chains of which were labeled with 
the dye FRET pair (Alexa 546/Alexa 647) by multimode coup- 
ling.^ 

Preparation of Isocyanate-Termlnated Star PEO. Hydroxyl- 
terminated star polymers with 80% ethylene oxide and 20% prop- 
ylene oxide as the backbone (ninnber average molecular weight 12 000 
g/mol; polydispersity index 1.15) were functionalized through reac- 
tion with a 12-fold molar excess of isophorone diisocyanate (IPDI) in 
a solvent-free process at 50 °C for 5 d.*^ The excess of IPDI was 
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removed by short path distillation. Size exclusion chromatography of 
the product (star PEO) proved that no dimer or trimer formation took 
place. 

Surface Preparation and Specific Immobilization of RNase H. 
Cleaning, activation and aminofunctionalization of substrates was 
carried out under cleanroom conditions. Substrates were cleaned through 
sonication in acetone (Selectipur, Merck, Haar Germany), 18.2 HQ 
MiUipore water and 2-propanol (Selectipur, Merck) for one minute each. 
After activation by an oxygen plasma, the substrates were aminofiinc- 
tionalized under an inert gas atmosphere for 2 h in a solution of 0.2 
mL N-[3-(trimedioxysilyl)propyl] ethylenediamine (Sigma-Aldrich, 
97%) in 50 mL dry toluene. Then the substrates were washed thoroughly 
and stored under dry toluene until further usage. For spin coating, the 
substrates were placed on the spin coater, covered by the star polymer 
solution and then accelerated within. 5 s to 2500 rpm for 40 s. The 
resulting films were stored overnight at ambient atmosphere for cross- 
linking. For biotinylation, biocytin was dissolved in 9 mL of deionized 
water. This solution was mixed' with the star polymer solution in 1 mL 
of THF. Films were then prepared as described above. 

PEO surfaces were formed following cleaning and activation of glass 
substrates with a conomercial aminosilane, Vectabond (Vector Labo- 
ratories. Burlingame, CA) according to the protocol recommended by 
the manufacmrer. 100 mg/mL PEO solutions in 50 mM NajCOs buffer 
(pH 8.2) were prepared from mPEG-SPA, MW 5000 (Nektar Thera- 
peutics, Huntsville, AL), or a mixture of Biotin-PEG-NHS, MW 3400 
(Nektar Therapeutics) and mPEG-SPA. MW 5000 with 1% PEO-biotin 
by weight. PEO was reacted with the Vectabond amino-functionalized 
surface for 1 h in the dark. After completion of the reaction, samples 
were thoroughly washed with 18.2 MQ MiUipore water. 

To form BSA covered surfaces, fluorescent contaminations were 
removed from untreated glass coverslips by brief exposure to an open 
flame. The surface was incubated with a 1 mg/mL BSA-biotin (Sigma- 
Aldrich) solution in 0.1 M sodium phosphate buffer (pH 7.4) for 10 
min, then washed with the same buffer and used immediately afterward. 

Biotinylated BSA/PEO/star polymer surfaces were exposed to 10 
/^g/mL streptavidin (Sigma-Aldrich) in 0.1 M sodium phosphate buffer 
(pH 7.4) for 10 min. Afterward, the surfaces were incubated with '^100 
pM RNase H solution in buffer A (20 mM Tris-HCl, 100 mM KCl, 10 
mM MgCh, pH 7.4) for 10 min. Finally, excess protein was washed 
with buffer A. 

Single Molecule Measurements. Single-molecule microscopy was 
performed by using a homemade laser scaiming confocal fluorescence 
microscope with Ai'^fKi'^ laser (modified model 164, Spectra Physics, 
Mountain View, CA) excitation. It is based on an inverted microscope 
(Axiovert 35, Zeiss, Gottingen, Gennany) and has two s^sparate 
detection channels for measurement of the emission in two spectral 
channels to enable FRET experiments. The setup is described in detail 
by Heyes et al.^ 

Results and Discussion 

The cleaned and activated substrates were cheeked by 
scanning force microscopy (SFM, root-mean-square roughness 
< 0.2 nm for 1 ptm scans) and contact angle (below the detection 
limit with water). Ellipsometry measurements showed an 
increase in the SiOx layer on the silicon of about 0.5 nm due to 
the activation step. The aminosilane layer exhibited a thickness 
between 1.1 and 1.5 nm and proved smooth when examined 
with SFM (root-mean-square roughness 0.6 nm for 1 fim scans). 

Spin coating of the star polymers from aqueous THF solutions 
resulted in smooth, homogeneous fihns. The isocyanate termi- 
nated stars started to react with water already in solution. Some 
of the isocyanate groups converted into amines, which reacted 
with isocyanate groups to yield di-, tri-, and higher oligomers 
of stars (Figure 1). Consequently, during spin coating, oligomers 




O 

Figure 1. Scheme of the cross-linking reaction of the polymer. 




Figure Z Images of star polymer derived surfaces, (a) optical microscopy 
(DIG) and scanning force microscopy (height image; profile height is 2 
nm) highUght the smoothness of the polymer films, (b) fluorescence 
microscopy image of a star polymer covered substrate half-dipped into a 
polystyrene solution. Labeled streptavidin adsorbs unspecifically on poly- 
styrene but not on the star polymer. Fluorescence intensities are --lOOO 
s-i for the star polymer covered area (equal to background intensity) and 
> 15 000 s~^ for the polystyrene covered area. 

coexisted with monomers containing amine groups and unmodi- 
fied monomers. Unreacted isocyanate groups can bind covalently 
to the amine groups on the aminosilanized substrate surface. 
Because of partial hydrolysis and cross-linking in solution, the 
deposited layer then reacts quickly to a highly cross-linked 
network. With the star polymer concentration chosen for these 
experiments (1 mg/niL), a film thickness of 5 ± 0.5 nm ^vas 
measured by ellipsometry on silicon substrates corresponding 
to at most three monolayers. The contact angle with water was 
determined by sessile drop measurements (ajjvancing contact 
angle) as 52*' ± 3° (Gadv) and by captive bubble measurements 
after storing the samples for 12 h in deionized water (receding 
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Figure 3. Analysis of the density of fluorescent spots on freshly prepared 
surfaces and after exposure to RNase H molecules labeled wiCi a single 
fluorophore. Significantly higher levels of unspecific adsorption were 
observed on BSA than on PEO surfaces. Hie star polymer surfaces showed 
negligible unspecific adsorption. 

contact angle) as 45® ± 3° (9rec).. Figure 2a shows the exquisite 
smoothness of the film, as examined by^optical microscopy and 
SFM. • ' ■ 

To examine the resistance of the star polymer filtns to 
unspecific protein adsorption, the^star polymer covered" sbb- 
strates were dipped halfway into a polystyrene solution in", 
toluene. Polystyrene surfaces are known for strong unspecific 
adsorption of proteicLS.^^ The so-created half polystyrene covered 
samples were immersed into protein solutions (streptavidin and 
avidin, labeled with fluorescent dyes) in different buffer systems 
(pH 5, pH 7.4, and pH 9.5). The proteins adsorbed onto the 
polystyrene, but not at all onto the star polymer coating (Figure ^ 
2b). In a control experiment, unspecific protein adsorption on 
plain, aminosilanized wafers was shown to be high (not shown). 
Therefore, prevention of unspecific protein adsorption on the 
star polymer surfaces was demonstrated firom pH 5 to 9.5— the 
pH range important for functional biomolecules. 

Unspecific adsorption was studied at the single-molecule level 
with unbiotinylated star-polymer surfaces and, for comparison, 
with physisorbed biotinylated BSA surfaces and with unbioti- 
nylated PEO brush surfaces. All three samples were simulta- 
neously incubated with the same . --15 nM solution of single- 
dye labeled RNase H in buffer A for 10 min and then thoroughly 
washed with buffer A, Because streptavidin was absent fi:om 
the solution, specific binding to the biotinylated BSA was 
excluded. Control experiments of the surface cleanliness were 
performed to ensure that the observed fluorescent spots were 
from labeled RNase H and not from contamination. The amount 
of unspecific adsorption was calculated from the density of spots 
(single molecules) on each of the surfaces. Typically, twenty 
linages (18x18 /im) of each sample were examined to obtain 
statistical significance. The data in Figure 3 show that the BSA 
surfaces were the cleanest in terms of background fluorescence, 
but they also showed the highest unspecific adsorption. The star 
polymer derived surfaces and PEO surfaces were slightly more 
contaminated, but showed markedly lower amounts of unspe- 
cifically adsorbed RNase H proteins. In fact, the density of spots 
did not increase at all from the background density on the star 
polymer layers, underscoring their highly adsorption-resistant 
nature. 
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Figure 4. Preparation of biotinylated, star pre-polymer derived layers with 
subsequent specific binding of streptavidin and RNase H. 

The star polymers were further naodified with biotin anchors 
for specific attachment of RNase H molecules to the surface 
(Figure 4). To this end, biocytin was reacted widi the isocyanate- 
terminated stars just prior to spin coating, yielding surfaces that 
were statistically decorated with biotin groups. These coatings 
were subsequently exposed to streptavidin (10 /ig/mL in 
phosphate buffer for 10 min). Streptavidin is tetravalent to biotin, 
and biotinylated RNase H in buffer A was attached to the surface 
via the vacant binding sites on the already bound streptavidin. 

Denaturation experiments were performed with RNase H 
molecules immobilized on the star polymer surface. To monitor 
the conformation of the polypeptide chain, a FRET pair of dye 
molecules was attached at positions 3 and 135 of the RNase H 
sequence. These locations were chosen such that of the dye 
molecules are in close proximity in the native conformation and 
significantly further apart in the denatured state. The FRET 
efficiency of each individual molecule can be calculated from 
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Figure 5. Two-color (red/green) images (top) and histograms of the number of molecules as a function of their FRET efficiencies (bottom) of RNase H 
bound to star polymer derived surfaces. One complete denaturation/rcnaturation cycle is shown, starting from the initial preparation in buffer solution (left) 
via the denatured state in 6 M GdmCl (middle) back to buffer solution (right). The changes in the distributions of FRET efficiencies mdicate that the protem 
molecules unfold (bright green, £ «s 0.3) -and refold (red, 1) completely reversibly. Tht peak at zero FRET efficiency (dark green) is due to RNase H 
molecules without a, or with an already bleached, red dye molecule. 
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Figure 6. FRET efficiency histograms (top) and relative protein densities on the surface (bottom) during one denaturation/rcnaturation cycle (0 M, 6 M^^ 
finally 0 M GdmCl) of RNase H bound to a star polymer derived surface Qefl colunm), adsorbed BS A'surface (middle column) and PEO 5000 poiy 
brush surface (right column). On PEO 5000 brush surfaces, protein denaturation is essentiaUy ineversible. On BSA, labeled RNase H dis^pears trom 
surface under denamring conditions, presumably together with the BSA layer. In. contrast, RNase H can be unfolded and refolded completely reversimy 
star polymer derived surfaces. Moreover, the spot density on the star polymer surface does not change after treatment with denaturant. 



the intensities of photons in the red (accepjor) and green (donor) 
channels, U and /d, respectively 



The correction factor y accounts for the difference in detection 
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efficiency between the two channels. The FRET efficiency 
varies with the inverse sixth power of the dye-to-dye separation 
and is thus exquisitely sensitive to distance variations due to 
structural changes of the protein. Denaturation and subsequ^D 
renaturation of surface-bound RNase H was performed y 
variation of the concentration of guanidinium chloride (Gdni 
as a denaturing agent. Figure 5 (top) shows scan images taken 
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under different solvent conditions with excitation at 514 nm. 
Each spot represents an individual RNase H molecule. In the 
juitial scan, performed in buffer, the rnajority of spots emitted 
photons predominantly into the red spectral channel (sensitive 
only to emission from the red dye), reflecting a large FRET 
efficiency (E ^ 1), and thus a close proximity of the dyes in 
the folded molecules. Upon exchange of the buffer with buffer 
containing 6 M GdmCl, most photons emitted from the spots 
were detected in the green spectral charmel (sensitive to emission 
from the green dye) because of the lower FRET efficiency (E 
^ 0.3) arising from the larger (average) dye-to-dye separation 
in the unfolded state. After re-exchange of the denaturant with 
buffer A, the spots turned red again, implying that the proteins 
refolded into the compact, high-ERET conformation. A small . 
population of green spots prior to denaturauon and a shghtly 
increased number of green spots after renaturation represented 
molecules without red acceptor dye, because either it was 
lacking in the first place or it was photobleached during the 
experiment.' These quahtative results were confirmed by a 
thorough quantitative analysis of the ERET efficiencies in Figure 
5 (bottom), which shows complete refolding on the star polymer 
surfaces. 

Figure 6 shows the comparison of the experiment presented 
in Figure 5 for the three surfaces examined. Together with the 
FRET data, we have also plotted the measured spot densities, 
normaUzed to unity for the first scan (in buffer). For the star 
polymer surfaces and the PEO brushes, the spot densities are 
relatively unaffected by the harsh chemical treatment with 6 M 
GdmCl. PEO brushes, however, are seen to completely prevent 
renamration of the proteins after unfolding. Even in the first 
scan (in buffer), many molecules show low FRET efficiencies 
on hnear PEO brush surfaces, due to partial or complete 
unfolding of the protein molecules. Most likely, the extended 
polypeptide chains interact and intermingle with the long PEO 
chains so that they cannot refold into their native conformation. 
Alternatively, bound RNase H may penetrate the flexible PEO 
brushes and interact with the underlying aminosilane. In any 
case, this process is apparentiy prevented by the extensively 



cross-linked star polymers. On the BSA surfaces, a substantial 
fraction of RNase H refolds, as seen from the recurrence of the 
maximuni at high FRET efficiencies. Yet, the broad pedestal 
at intermediate FRET efficiencies suggests that some fraction 
of the molecules did not refold properly. Moreover, measure- 
ment of the spot (protein) density on the surfaces revealed that 
the protein concentration on BSA surfaces was reduced sig- 
nificantly under 6 M GdmQ. This observation impHes that a 
large fraction of RNase H proteins was removed by treatment 
with GdmCl, most likely together with the BSA layer phys- 
isorbed to the glass. 

Conclusions 

Ultrathin, smooth layers from isocyanate terminated star 
polymers on glass substrates were shown to be extremely 
resistant to unspecific adsorption of proteins while at the same 
time suitable for easy chemical modification. Apphcation by 
spin coating offers a simple procedure for the preparation of 
minimally interacting surfaces that are functionalized by suitable 
linker groups to immobilize proteins in their active natural 
conformation. These coatings form a versatile platform for 
biofunctional and biomimetic surfaces and single-molecule 
fluorescence microscopy studies on immobilized proteins. In 
single-molecule denaturation/renaturation experiments with 
RNase H molecules specifically attached to the star polymers, 
complete reversibihty of this process was observed, implying 
minimal interaction between the protein and the surface. A 
comparison with adsorbed BSA and PEO brush surfaces clearly 
demonstrated the superior quality of the star polymer layers. 
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ABSTRACT 

We have developed a technique to establish catalogues 
of protein products of arrayed cDNA clones identified 
by DNA hybridisation or sequencing. A human fetal 
brain cDNA library was directionally cloned in a 
bacterial vector that allows IPTG-inducible expression 
of Hise-tagged fusion proteins. Using robot technology, 
the library was arrayed in microtitre plates and gridded 
onto high-density /ns/fu filters. A monoclonal antibody 
recognising the N-terminal RGSHg sequence of 
expressed proteins (RGS-His antibody, Qiagen) detected 
20% of the library as putative expression clones. Two 
example genes, GAPDH and HSP90a, were identified 
on high-density filters using DNA probes and antibodies 
against their proteins. 

For construction of the human expression library hExl, cDNA 
was prepared from fetal brain poly(A)''' RNA by oligo(dT)- 
priming (Superscript Plasmid System, Life Technologies). Products 
were size-fractionated by gel filtration and directionally (SaH-Notl) 
cloned into a modified pQE-30 (Qiagen) vector for IPTG-inducible 
expression of His6-tagged fusion proteins (pQE-30NST, GenBank 
accession no. AF074376), Escherichia coli SCSI cells (Stratagene) 
carrying the plasmid pSElll with the lacl^ repressor and the 
argU gene for a rare arginine tRNA (1) were transformed by 
electroporation. PGR analysis of 96 clones revealed an average 
insert size of -1.5 kb (range 0.5-5.0 kb). 

The library was plated onto 2xYT-AKG agar plates (230 mm x 
230 mm Nunc Bio Assay Dishes containing 2xYT agar, 100 |ig/ml 
ampicillin, 15 ^g/ml kanamycin and 2% glucose) and grown at 
37 °C overnight. Using a picking/gridding robot (2), 193 536 
colonies were picked into 3 84- well microtitre plates (Genetix) 
containing 2xYT-AKG medium supplemented widi freezing mix 
(0.4 mM MgS04, 1 .5 mM Nas-citrate, 6.8 mM (NH4)2S04, 3.6% 
glycerol, 13 mM KH2PO4, 27 mM K2HPO4. pH 7.0). Bacteria were 
grown in microtitre wells at 37°C ovemi^t, and 9216 or 27 648 
clones were gridded onto 222 mm x 222 mm filter membranes in 
a duplicate pattern (Fig. 1). Nylon filters (Hybond-N^, Amersham) 
were gridded for DNA hybridisations and processed as described 
(3). For protein analysis, polyvinylidene difluoride (PVDF) 



filters (Hybond-P, Amersham) were gridded, incubated on 
2x YT-AKG agar plates at 30 °C overnight and induced for protein 
expression for 3 h at 37°C on agar plates containing 1 mM IPTG. 
These protein filters were processed on pre-soaked blotting paper, 
i.e., denatured in 0.5 M NaOH, 1.5 M NaCl for 10 min, 
neutralised for 2x 5 min in 1 M Tris-HCl, pH 7.5, 1 .5 M NaCl and 
incubated for 15 min in 2x SSC. Filters were air-dried and stored 
at room temperature. 

For global protein expression, high-density filters were 
screened with the monoclonal antibody RGS His (Qiagen). This 
antibody recognises the N-terminal sequence RGSHg of fusion 
proteins over-expressed fi-om pQE-30 vectors and labelled -20% 
of the hExl clones (Fig. lA). Negative clones have inserts in 
incorrect reading frames with stop codons leading to short 
polypeptides that cannot fold into stable structures and are 
degraded within the host cell (4), Two example proteins, GAPDH 
(35.9 kDa, Swiss-Prot P04406) and HSP90a (84.5 kDa, Swiss-Prot 
P07900) were chosen for detailed analysis. A set of three DNA 
filters (80 640 clones) were screened with cDNA probes. Two 
hundred and six (0.26%) clones were positive with a human 
GAPDH probe (Fig. IB), and 56 (0.07%) clones were identified 
with a human HSP90a probe. About 25% of these clones were 
positive with the RGS-His antibody. To confirm the expression of 
GAPDH or HSP90a proteins by these clones, protein filters were 
screened with antibodies against GAPDH (Fig. IC) or HSP90a, 
respectively. Fifty-seven percent of the GAPDH and 72% of the 
HSP90a clones detected by the RGS His antibody were also 
positive with the protein-specific antibodies. Sequence analysis 
showed that the remaining clones had inserts in an incorrect 
reading frame or expressed truncated GAPDH which reacted 
pooriy with the GAPDH antibody. 

In tum, 100% of the anti-GAPDH- but only 35% of the 
anti-HSP90a-positive clones were detected by the RGS-His 
antibody. All RGS-His-negative HSP90a clones had inserts in 
incorrect reading frames but nevertheless expressed proteins 
detected by the HSP90a antibody on western blots (data not 
shown). This indicates HSP90a molecules without a His6 tag, 
suggesting translational start sites within cDNA inserts. Three 
anti-HSP90a positive clones contained inserts that were not 
recognised by the cDNA probe and turned out to be unrelated 
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Figure 1. Identification of cDNA clones expressing recombinant ftision proteins on high-density filters. (A) RGS His antibody detection (gridding pattern of 3x3 
sunounding ink guide dots as shown in lower right comer). (B) DNA hybridisation with a GAPDH cDNA probe, as described (3). (C) Screening with a polyclonal 
anti-GAPDH antibody (corresponding sections of filters featuring 5x5 gridding patterns as shown in upper right comer, identical clones are circled). Before antibody 
screening, filteis were soaked in ethanol, bacterial debris was wiped off in TBST-T (20 mM Tris-HCl pH 7.5, 0.5 M NaCl, 0.1% Twcen 20, 0.5% Triton X-100), 
followed by washing 2x 10 min in TBST-T and 10 min in TBS. Filters were blocked in blocking buffer (3% non-fat, dry milk powder in TBS, 1 50 mM NaCl, 10 mM 
Tris-HCl, pH 7.5) for 1 h and incubated with antibody for 2 h (1:2000 diluted monoclonal RGS His, Qiagen, or 50 ng/ml monoclonal anti-HSP90a, Transduction 
Laboratories, Lexington) or for 16 h (1:5000 diluted rabbit anti<;APDH). After washing for 2x 10 min in TBST-T and 10 min in TBS, filters were incubated with alkaline 
phosphatasfrKXjnjugated anti-mouse or anti-rabbit Ig (Pierce) for 1 h, washed 3x 1 0 min in TBST-T, 10 min in TBS and 1 0 min in AP bufier (1 mM MgCh, 0. 1 M Tris-4iCl, 
pH 9.5) and incubated in 0.25 mM Attophos (JBL Scientific, San Luis Obispo) in AP bufler for 5 mia Filters were illuminated with long-wave UV light and images were 
taken using a high resolution CCD detection system. Image analysis was done using Xdigitise software (written by Huw Griffith) which is available on request 



sequences. These sequences were analysed using BESTFIT 
(Wisconsin Package Version 9.1, Genetics Computer Group, 
Madison) but no common motifs of significant homology were 
found. This limited antibody specificity is not surprising as it 
reflects cross-reactivity which is not usually tested against a 
whole library of proteins as in our method. 

The main advancement of our technique over existing technology 
(e.g. Xgtl 1 libraries; 5) is its high-throughput link between DNA 
sequence information and protein expression as a resource for 
unlimited ftiture use. Having screened a library for protein 
expression once, we can always go back and identify products of 
new genes as they are discovered, attributing first functional 
information to them. Based on screenings with the RGS His 
antibody, 37 830 putative expression clones were re-arrayed into 
new microtitre plates, and high-density protein and DNA filters 
were prepared and are available fi^om the Resource Centre of the 
German Human Genome Project (http://www.rzpd.de ). 

The main technical problems of our approach are inherent in 
cDNA library and filter hybridisation technology. 01igo(dT)-primed 
cDNA is biased towards 3 '-ends of genes, and, subject to insert 
size, N-terminal parts of larger proteins are often missing. To 
include a maximum number of epitopes for antibody screening, 
complementary random-primed cDNA libraries should be used. 
Quantification of signal intensities on filters is largely based on 
arbitrary thresholds for manual or automated image analysis. 
Therefore, our approach is exclusively based on positive clones 
to be confirmed by sequencing and/or protein characterisation. 

We envisage two main fields of application for our method. 
First, catalogues of protein products can be established for 
different tissues and developmental stages. As these proteins are 
expressed from arrayed cDN A clones, their identity can easily be 



checked by high-throughput gene identification techniques 
(e.g. oligonucleotide fingerprinting; 6). Therefore, gene expression 
patterns of normal and diseased tissues can be translated to the 
protein level, keeping a direct link to already existing DNA 
sequence data. Second, our method should also enable high- 
throughput analysis of antibody specificity and other protein-i?rotein 
interaction or ligand-receptor systems (7), including non-protein 
molecules. 
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We have constructed a human fetal brain cDNA li- 
braiy in an Escherichia coli expression vector for 
high-throughput screening of recombinant human 
proteins. Using robot technology, the library was ar* 
rayed in microtiter plates and gridded onto high-den- 
sity filter membranes. Putative expression clones were 
detected on the filters using an antibody against the 
N-terminal sequence RGS-His« of fusion proteins. Pos- 
itive clones were rearrayed into a new sublibraty, and 
96 randomly chosen clones were analyzed. Expression 
products were analyzed by SDS-PAGE, affinity purifi- 
cation. matrix*assistcd laser desorption/lonization- 
time-of-flight mass spectrometry, and the determined 
protein masses were compared to masses predicted 
from DNA sequencing data. It was found that 66% of 
these clones contained inserts in a correct reading 
frame. Sixty-four percent of the correct reading frame 
clones comprised the complete coding sequence of a 
human protein. High-throughput microtiter plate 
methods were developed for protein expression, ex- 
traction, purification, and mass spectrometric analy- 
ses. An en^me assay for glyceraldehyde-3-phosphate 
dehydrogenase activity in native extracts was adapted 
to the microtiter plate format. Our data indicate that 
high*throughput screening of an arrayed protein ex- 
pression library is an economical way of generating 
lai^e numl>ers of clones producing recombinant hu- 
man proteins for structural and functional 

analyses. C XOOO Academic PRSS 



INTRODUCTION 

Cellular functions are controlled by the networked 
expression of gene catalogues. Functional network 
analysis requires the parallel expression and charac- 
terization of large numbers of gene products. Struc- 
tural analysis provides clues to biochemical functions 
of unknown proteins (Hwang etal,. 1999; Zarembinski 
et a/., 1998). Genome analysis by DNA hybridization 
and sequencing has become a highly automated pro- 
cess (Lehrach etaL, 1997). In contrast, the individual- 

' To whom correspondence should be addressed. Telephone: +49-3t>- 
8413-1614. Fax: -l-49-30-d413-ll28. E-mail: buessmv9inolgen.mpg.de. 



Ity of protein molecules demands highly customized 
procedures for their expression. Automation of these 
procedures requires systems that allow the efficient 
handling of large numbers of clones representing many 
different proteins. Bacterial systems are easy to man- 
age but the expression of eukaryotic proteins can be 
problematic, due to aggregation, formation of insoluble 
Inclusion bodies, and/or degradation of the expression 
product (Hockney. 1994; Makrides. 1996). Eukaryotic 
systems suffer from lower yields of heterologous pro- 
tein (e.g., Saccharomyes cerevisiae:B\xtVhQ\zmd Glee- 
son, 1991), high demands on sterility (e.g.. mammalian 
systems: Aruffo, 1997; Kingston etaL, 1997), or time- 
consuming cloning procedures (e.g., Baculovirus sys- 
tem; Miller. 1993). 

We have shown that automated technology can be 
used for high-throughput protein expression screening 
(Bussow eta!,, 1998; Lueking etaL 1999). Mammalian 
cDNA libraries are directly cloned into bacterial ex- 
pression vectors, circumventing the subcloning of indi- 
vidual protein-coding sequences. In a first screening 
round, putative protein-expressing clones are identi- 
fied on high-density filters using antibodies against a 
vector-encoded tag sequence. The detected clones are 
rearrayed into a smaller sublibrary. In a second round, 
small^ale protein expression Is performed in microti- 
ter plates. Products are analyzed for size, yield, homo- 
geneity, and solubility using SDS-PAGE. affinity pu- 
rification, and matrix-assisted laser desorption/ 
ionizatlon-time-of-fllght mass spectrometry (MALDI- 
TOF-MS). Expression levels of large numbers of clones 
are assessed in parallel to find the most suitable for 
high-throughput structural analyses by X-ray crystal- 
lography or NMR and functional screening. In a third 
round, protein function is also assayed in the microti- 
ter plate format. As an example, bacterial lysates of 96 
clones were screened for expression of glyceraldehyde- 
3-phosphate dehydrogenase (GAPDH) activity. In sum- 
mary, our multistep screening approach enables the 
generation of an expression clone catalogue of human 
proteins as a resource for structural and functional 
genomic analyses. 
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NfATERULS AND METHODS 

cDNA library and protein ejtpression screening on higft-density 
ftiiers. A cDNA library (hExl) from human feta) brain tissues was 
cloned tn the expression vector pQESONST (GenBank Accession No. 
AF074376). High-density protein Biters were prepared and were 
screened with the ROS * His antibody (Qlagen). as described (BQssow 
et a!.. 1998). In total. 193,536 clones of the hExl library were picked 
and labeled according to the RZPD nomenclature (http://www. 
rzpd.de). Clone names contain the library number MPMGp^O as a 
prefix. 

PCR amplification end sequencing. cDNA I nser ts were amplified 
using primers pQE65 (TGAGCGGATA ACAATTTCAC ACAG) and 
pQE276 (CGCAACCGAC CCTTCTGAAC) at an annealing temper- 
ature or 65*C. PGR products were tag*sequenced using primer 
pQE65. 

Protein expression and nidceJ cheiate affinity chromatograpfiy in 
microUter piates. Proteins were expressed In 1-ml cultures in deep- 
well microliter plates, and protein extracts were obulned as de- 
scribed (Lucking eta!., 1999). Twenty-five mtcroltters of 50% Ni-NTA 
agarose was added to protein extracts obtained under denaturing 
conditions, and HlstHagged proteins were bound by shaking for I h 
in a microtlter plate shaker. The agarose beads were washed three 
times by resuspending them in BufTer C (S M urea. 0.1 M NaHxPO^, 
0.01 M Tris, pH 6.3), shaking for 5 min, and removal or liquid on the 
vacuum filtration manifold. Buffer C was removed by washing four 
limes with 200 / J of 5 mM Trls-HCI. pH 8.0. Proteins were eluicd by 
adding lOO >a of 35% acetonlcriie, 0.1% TFA. shaking for 10 min. 
followed by centrifugation at 2000 rpm for 2 min. and collection of 
eluates In a fresh 98-weil microliter plate. Five microliters of the 
eluaies was analyzed by SDS-PAGE. and Q.S-pl aliquois were sub- 
jected to N4ALDI-T0F*MS analysis. 

MALDI- T0F-M5 analyses. Aliquois (0.5 ^1) of protein clualcs 
were loaded onto a Bniker Scout-384 MALDI sample support (384 
sample positions arranged according to the microliter plate formal), 
followed by addition of 0.5 id sinapic add matrix solution (saturated 
in 35% acetoniirile). The samples were deposited onto the central 
positions E7-E18. . .L7-L18. In addition, a protein calibration stan- 
dard containing 0.5 pmol horse heart cytochrome c and 1 pmol 
human cartxinlc anhydrasc was placed between the sample positions 
K 12. 112, HI 3. and 113. All samites were analyzed on a Bruker Scout 
384 Biflex III MALDl-TOF mass spectrometer in linear operational 
mode using externally determined calibraUon constants. Exclusively 
positively charged ions were detected, and 100-150 single-shot spec- 
tra were accumulated for Improved signal-to-noisc ratio. Before the 
analysis, the Instrumental parameters were optimized for good sig- 
nal resolution In the mass range 10-30 kDa using the protein cali- 
bration standard, and external calibration constants were deter- 
mined using the molecular Ion signals of cytochrome c and human 
carbonic anhydrase I. if indicated by the above measurements, pro- 
teins contained In 10 /il eluate were neutralized and reduced by 
addition of 2 >a containing 500 mM Tris-HGl. pH 7.5. SO mM DTT 
and incubated at 55*0 for 30 min. Allquots (0.5 mD of these solutions 
were deposited onto the MALDI sample support followed by 0.5 td 
sinapic acid matrix solution containing also 2.5% TFA. Afier solvent 
evajxtratlon. these samples were analyzed as described above. 

CAPDH assay. The GAPDH assay described by Heinz and Frei- 
mOller (1982) was adapted to the microti Icr format and performed In 
duplicate. One hundred fifty microliters of assay mix (33 mM TEA- 
HCI, 0.23 mM NADH. 6.7 mM MgSO*. I mM ATP, 3 mM glycerate 
3-phosphate. 3.8 mM t-cystclne) was added to 1-^ soluble protein 
fractions diluted 1:10. in 96-wel] microtlter plates (Microtesi 111. 
Falcon). The decrease of Auo was measured with a microtiier plate 
photometer (Speccramax 250, Molecular Devices). One unit of 
CAPDH catalyzes the reduction of 1 fimol of 1 .3-dIphosphateglycer- 
ale to D-gIyceraldehyde-3-phosphatc per minute. 



RESULTS 

A human fetal brain cDNA expression library (hExl) 
was constructed in the vector pQESONST, which allows 
the expression of fusion proteins with the N-termina! 
sequence RGS-HiSe (Bussow et aJ., 1998). Briefly. 
193,536 clones were picked into 384-well microtiter 
plates (plates 1 to 504 of hExl) and gridded as high- 
density protein filters using a robotic system (Lehrach 
et al, 1997). These clone arrays were screened for 
putative protein expression clones using the monoclo- 
nal antibody RGS • His (Qiagen). which recognizes the 
N-tenminal sequence RGS-HiSft of recombinant expres- 
sion products. The antibody preferentially labels clones 
containing a cDNA insert in-frame with RGS-HiSs. In 
alternative reading frames, stop codons cause the ex- 
pression of short and unstable products that are de- 
graded in the Escherichia coli host cell (Gottesman, 
1996). A total of 37,830 (19.6%) clones were recognized 
by the RGS • His antibody* 67% of which were labeled 
with medium or high intensity. All positive clones were 
combined in a new library by rearraylng in 99 X 384- 
well microtiter plates Oabeled plates 505 to 603 of 
hExl) using the same robotic system equipped with 
dedicated rearraying software (Bussow et aL, 1998). 

cDNA inserts of 96 randomly chosen clones of the 
medium and high RGS ■ His signal intensity groups 
were sequenced. All 96 clones originated from plate 
582 of the rearrayed hExl library. cDNA inserts were 
amplified by PGR and analyzed by 5'-tag sequencing. 
An average insert size of 1.5 kb was determined. 5'-tag 
sequences of 93 cDNA inserts were obtained and used 
to search SP-TrEMBL, the combined SWISS-PROT 
and TrEMBL protein database (Bairoch and Apweiler, 
1998) using the program BLASTX (Altschul et aL 
1990). Fifty-nine sequences were found to match hu- 
man proteins in this database (Table 1). Thirty-eight 
(64%) of those sequences matched the beginning of a 
human protein, suggesting that the complete coding 
region had been cloned (full-length clones). Thirty-nine 
(66%) of the 59 sequences were fused to the N-terminai 
sequence RGS-Hisg in the correct reading frame (RF+). 
Protein molecular masses were predicted for these 
clones by completing their 5'-tag sequences using the 
matching sequences in the database (Table 1). consid- 
ering that the formyl group of the N- terminal formyl- 
methionine is removed in £1 coil and that the resulting 
N-terminal methionine Is usually not removed if it is 
followed by arginine (Sherman et aL, 1985). 

Expression products of the same 96 clones were an- 
alyzed by SDS-PAGE of cellular protein extracts and 
by nickel chelate affinity purification, followed by both 
SDS-PAGE and N4ALDI-TOF-MS. Protein expression 
and all subsequent steps were performed in 96-welI 
microtiter plates. Seventy-two (75%) of the total 96 
clones expressed recombinant proteins detectable in 
SDS-PAGE. Thirty-five of the 39 in-frame clones pro- 
duced RGS-HiSft tag fusion proteins of expected sizes. 
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TABLE 1 



Protein Expression Properties of h£xl Clones with Sequence Database Matches 















Expressed 


Expressed 






SP-TrEMBL database match 


First matched 




Predicted 


protein size. 


protein size. 


Clone 






amino acid in 


Reading 


protein 


measured by 


estimated by 


MPMCp 


Accession 




database 


frame 


size 


MALDI-TOF- 


SDS-PAGE 


800... 


No. 


Protein name 


sequence 


(RF) 


OcDa) 


MSO^Da) 


(kDa)' 


A06582 


000217 


NADH-ubiqulnoine oxldoreductase 23 kDa 


I 


- 


— 


8,602 


12 






subunlt precursor (EC 1653) 












A08582 


P04687 


Tubulin cr-l chain 


174 


+ 


33.859 


33.865 


35 


AI2582 


P04765 


Eukaryottc Initiation factor 4A-I CEIF-4A-I} 


12 


+ 


47.808 


47.819 


50 


AJ4582 


Q1I203 


CMP-AAacetylneuramlnate-B- 1 A' 


234 


4- 


18.825 


18.831^ 


20 






galactoside a-2»3-stalyltran5ferase 












A 16582 


P13639 


Elongation factor 2 (EF-2) 


1 




98.098 


(18.458) 


(97) 


A18582 


P49006 


Marcks-related protein (Ti^AC-MARCKS) 


1 


- 


— 


12,282 


14 


A20582 


P56182 


NNP-1 protein a)2lS2056E) 


240 




27.850 


27.857 


34 


A24582 


P4S006 


Marcks-related protein (MAC-MARCKS) 


I 


- 


— 


10.052 


13 


C06582 


Q15853 


Upstream stimulatory factor 2 


128 


- 





13.908 


18 


C10582 


P36578 


60S Ribosomal protein LI (L4) 


1 


- 





10.304 


10 


C12582 


P49006 


Marcks-related protein (MAC-MARCKS) 


1 






13.557 


16 


£02582 


P43308 


Translocon-associated protein. $ subunit 


1 


- 


— 


10,753 


10 






precursor CTRAP-fl) 












E04582 


PJ4793 


60S Ribosomal pnstein L40 (CEP52) 


36 


— 





7.750 


10 


El 0582 


075312 


Zinc-finger protein ZPRl 


1 


+ 


54.299 


54.334 


55 


E12582 


P25UI 


40S Ribosomal protein S25 


1 


- 





7.504 


10 


EU582 


P04687 


Tubulin ohl chain 


304 


- 


— 


11.301 


10 


E18582 


Q15560 


Transcription elongation factor S-II 


1 


4 


39.238 


8.445 


10 


E20582 


Q13885 


p Tubulin 


275 




22.579 


22.594 


23 


C02582 


PI 4923 


Junction plakogjobln 


287 


+ 


53.034 


53.066 


50 


C04S82 


Q16478 


Glutamate receptor subunlt 


832 


— 





12.653 


n.e. 


C 10582 


P07I08 


Acyl-CoA-binding protein (ACBP) 


1 




15.098 


15.109' 


15 


G12582 


P48735 


Isocitrate dehydrogenase (NADP). 


1 


+ 


55,802 


55,832 


50 






mitochondrial procui^or (EC 11142) 












G 14582 


P39023 


60S Ribosomal protein L3 


225 


- 





8.904 


10 


G16S82 


P1S880 


40S Ribosomal protein S2 {S4) 


1 


+ 


34,328 


34.302 


35 


020582 


P54198 


HIRA protein 


383 


+ 


72.278 


72.316' 


65 


102582 


P36404 


ADP-ribosylation factor-like protein 2 


1 


• 





12.703 


12 


104582 


P30086 


Phosphatldylethanolamine^inding protein 


I 




27.046 


27.029' 


29 


106582 


P25111 


40S Ribosomal protein S25 


1 


+ 


17.678 


17.685 


23 


[10582 


P39023 


60S Ribosomal protein L3 


1 


+ 


49.026 


49.037' 


50 


112562 


P15880 


40S Ribosomal protein S2 (34) 


1 


+ 


34.328 


34.329' 


35 


114582 


Q13098 


G Protein pathway suppressor 1 (GPS I 


193 


- 


— 


17.996 


18 






protein) 












118582 


P05092 


Peptldyl-prolyt c/5-trans isomerase A 


1 


+ 


21.441 


21.460 


23 


120582 


P02570 


Actin, cytoplasmic 1 0<actin) 


1 




47.297 


47.338 


45 


124582 


P233g6 


40S Ribosomal protein S3 


1 


+ 


29.749 


29.761' 


32 


K04582 


Q03827 


Transcription factor ETRlOl 


97 


-I- 


16.184 


16,182 


23 


K08582 


Q00403 


Transcription initiation factor IIB (TFIIB) 


1 


+ 


38.133 


38.156 


38 


K10582 


015143 


ARP2/3 complex 41 kDa subunlt (P41- 


1 


+ 


46.403 


46.405 


45 






ARC) 












K 12582 


Q1566B 


Asparaginc synthetase (fragment) 


1 


- 


— 


17.806 


21 


K14S82 


P49241 


40S Ribosomal protein S3A 


1 


+ 


33.094 


33.095' 


35 


K 16582 


Q997I9 


Cell division control-related protein 


1 






8.509 


12 


K 18582 


P04687 


Tubulin o-l chain 


306 


+ 


19.199 


19.202 


21 


K20582 


000240 


Dihydropyrimidinase-relaled proiein-4 


1 






15.257 


M 






(DRP-4) 












M02582 


Q13885 


^-TubuIIn 


253 






26.301 


27 


M04582 


P4936B 


T-complex protein 1. -y subunlt (TCP*]<7) 


19 




61.326 


61.352 


60 


Ml 0582 


P02571 


Actin, cytoplasmic 2 (-y-actin) 


1 


+ 


46.718 


46.749 


45 


Ml 2582 


P32969 


60S Ribosomal protein L9 


1 






9,950 


14 


Ml 8582 


Q13885 


3 Tubulin 


i 




54.291 


54.307 


58 


M20582 


P0276a 


Serum albumin precursor 


116 


+ 


59,076 


59.U6 


60 


M22582 


P02570 


Actin, cytoplasmic 1 (p-ACTIN) 


1 


+ 


47.297 


47.306 


45 


M24582 


Q06830 


Thloredoxin peroxidase 2 


1 


+ 


26.231 


26.222 


25 


002582 


Q02878 


605 Ribosomal protein L6 


1 


+ 


35.457 


35.450' 


35 


004582 


Q02878 


60S Ribosomal protein L6 


1 


+ 


35,457 


35.467' 


35 


O0S582 


P17080 


CTP-binding nuclear protein RAN (TC4) 


1 




28.494 


28.516' 


30 



4 bOssow et al. 



TABLE I— Continued 















Expressed 


Expressed 






SP-TrEMBL database match 


First matched 




Predicted 


protein size. 


protein size. 


Clone 






amino acid in 


Reading 


protein 


measured by 


estimated by 


MPMCp 


Accession 




database 


frame 


size 


MALDI-TOF- 


SDS-PACE 


800... 


No. 


Protein name 


sequence 


(RF) 


(kDa) 


MS 0(D^) 


(HDa)- 


008582 


Q08379 


CoIgin-95 


414 


+ 


26.312 


26.303 


30 


010582 


P01922 


HeniQglobln a-chain 


1 


+ 


15.126 


15.122 


20 


014582 


P21810 


BonG/cartilage proteoglycan I precursor 


1 






9.231 


14 






(blglycan) (PG-Sl) 












016582 


Q 15597 


Translation INITIATION FACTOR EIF-4t 


215 


+ 


58.069 


58.108 


60 




(fragment) 










60 


018582 


Q14257 


Calcium-binding protein ERC-S5 precursor 


24 


+ 


42.157 


42.165 


020582 


Q02543 


606 Ribosomal protein LISA 


I 


+ 


24.374 


24.379*' 


26 



Note. RF * reading frame of Insert In relation to HlS|-tag. Expressed protein size was determined by MALDI-TOF-MS and esUmated by 
SDS-PAGE referring to the band of the largest size visible against the £ oo// background. 
' n.e.. no expression observed. 

^ Prior to reduction with DTT. abundant signals corresponding to the monomer, monomer 1 glutathione residue, and the protein dimer 
were observed. 

' Prior to reduction with DTT the determined mass was 295-315 Da higher (-1- glutathione). 



while in the remaining four clones {El 8582, K10582, 
002582. and 004582) no expression products could be 
detected CTable 1). His^-tagged proteins were affinity- 
purified under denaturing conditions using Ni-NTA 
agarose beads and filter plates (Fig. 1). Six expression 



products, including £18582. K10582. 002582. and 
O04582, which were not detected in whole cellular 
protein extracts, could be identified after purification. 
Protein sizes were determined by SDS-PAGE and 
MALDI-TOF-MS (Fig, 2), and both data sets were 
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FIG. 1. SDS-PAGE of nickel chelate purified proteins. Following protein expression in microliter plates, cells were lyscd under 
denaturing conditions. His«-tagged proteins were purified by nickel chelate chromatography in mfcroUter plates and were analyzed by 
SDS-PAGE, followed by Coomassle staining. Lanes are labeled using RZPD clone names, omitting Che prefix MPMGpSOO. 
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FTC. 2. MALDl-TOF-MS of nickel chelate purified proteins. Following nickel chelate affinity purification, the obtained eluates were 
analyzed by MALDl-TOF-MS. (Top three panels) Some recorded mass spectra: clones and labeling as In Fig. I. M* and M^*, singly and 
doubly charged molecular Ions of expected expression products. The numbers of ihe Hrst and last amino acids indicate assigned C-terminally 
truncated protein sequences. For clone A 16582. the experted protein (98,098 kDa) could not be detected. For some expression products, the 
determined molecular masses exceeded the predicted values by approximately 300 Da (Table 1). Indicative of glutathionylation. In addition, 
for A14582 a strong protein-dimer molecular ion signal was recorded Indicative of proteln-protetn disulfide bridges. These Indications were 
verified by reduction with DTT prior to the mass spectrometrlc analysis. (Bottom) Mass spectra obtained from A 14582. CI 0582, and M245S2 
before (top spectrum) and after (bottom spectrum) reduction with DTT. +G. single glutathlonylatlon; +SA. adduction of one slnaptc add 
molecule used as MALDI matrix. 



compared to the corresponding values predicted from 
DNA sequencing data (Table 1). 

As expected, the predicted molecular masses were 
considerably better matched by the masses determined 
with IVIALDI-TOF-MS than by those estimated from 
SDS-PAGE (Table 1). For most clones, the determined 
molecular mass deviated less than 0. 1% from the pre- 



dicted value. The fusion protein expected for clone 
A 16582 (98-098 kDa) could not be detected; instead an 
abundant signal at m/z 18.446 dominated the recorded 
spectrum. Considering that the N-terminal sequence 
RGS-HiSft is vital for the applied affinity purification 
and that N-terminal methionine is usually not re- 
moved within E. coIJ If followed by arginine (Sherman 
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etaL 1985). this signal could be assigned to the trun- 
cated sequence 1-169 (expected mfz 18.442). SDS- 
PAGE of the purified A16582 protein showed a number 
of bands ranging from approximately 20 to approxi- 
mately 100 kDa ^ig. 1). These results suggest that the 
protein is unstable in E. coIL An incomplete expression 
product was also observed for clone E18582. Various 
other clones produced abundant C-terminally trun* 
cated expression products in addition to the expected 
product. Figure 2 shows a selection of the recorded 
mass spectra including signal interpretation. Clone 
E20582 expressed a 25»075-Da protein of unknown 
identity in addition to the expected 22,579-Da protein. 
A possible explanation would be a frameshift mutation 
leading to a larger expression product in a subpopula- 
tion of the E20582 E. coli cells. 

For the clones A14582, G10582, 012582, 016582. 
020582. 104582. 110582. 112582. 124582. K04582. 
K10582, M24582. 002582. 004582, and 006582. the 
determined molecular masses exceeded the expected 
values by 0.290-0.320 kDa (three examples are shown 
In Fig. 2). This deviation is indicative of glutathiony- 
lation (attachment of one glutathione residue to a cys- 
teine residue by formation of a disulfide bridge), an 
essential intermediate reaction of disulfide bridge re- 
duction in E, coli. In addition, for the clone A14582 a 
strong signal corresponding to the molecular mass of 
the protein dimer was detected (Fig. 2). In addition to 
singly charged molecular ions, to a lower degree non- 
specific multimers as well as doubly and triply charged 
molecular ions are formed during MALDI-TOF-MS of 
protein samples. Therefore, peak intensities must be 
taken into consideration to recognize protein dimers in 
the sample. Since the N4ALDI sample preparation con- 
ditions used (pH <2. 35% acetonitrile) denature most 
protein-protein interactions, a covalent linkage is 
likely to account for the observed dlmiers. Both gluta- 
thionylation and protein-protein disulfide bridges 
were verified by reduction with DTT prior to the mass 
spectrometric analysis. After reduction, the deter- 
mined protein masses matched the expected masses in 
all cases within 0. 1% maximum deviation, and no more 
protein dimers were detected (Fig. 2), 

Expression products from inserts cloned in incorrect 
reading frames (RF-) were generally smaller than 
those from Inserts in the correct reading frame (RF +. 
Table 1). As shown in Fig. 3. the molecular mass of 
expression products is correlated to the reading frame 
of the cDNA Insert. Sixteen of 17 expression product 
smaller than 15 kDa derived from RF- clones, while 
31 of 32 expression products of at least 20 kDa size 
derived from RF+ clones. Thus the molecular mass of 
a clone's expression product can be used as a measure 
to predict the reading frame of its cDNA insert, if the 
DNA sequence is unknown. 

The screening of commonly available cDNA expres- 
sion libraries for functional activities is complicated by 
large numbers of clones that do not express their cDNA 
inserts as proteins. By arraying, antibody screening. 




Molacular Mass Range [kOa] 

FIG. 3. Expression produce size and reading frame. Relationship 
between ttie size of expressed recombinant protein and reading 
frame In the same clones as in Table 1. Clones represented by gray 
bare contain cDNA Inserts translated In the correct reading frame 
(RF+), whereas in the other clones (white bars) translation can occur 
only in an incorrect reading frame (RF-). Numbers or clones are 
Indicated in the bars. 

and molecular mass detection, putative expression 
clones can be detected with a high level of efficiency. 
Therefore smaller numbers of clones must be assayed, 
and functional screening in microtiter plates becomes 
practicable. As an example, a GAPDH activity assay 
was adapted to the microtiter plate format. A positive 
control clone <D215) expressing human GAPDH as an 
RGS-HiSg tag fusion protein was Introduced in ex- 
change for one of the 96 hExl clones of Fig. I. Protein 
expression was induced, and cells were lysed in 150 ftl 
lysis buffer under nondenaturing conditions. GAPDH 
activities were measured in 0.1 -/xl aliquots of the ly- 
sates. and the positive control clone (D215) was clearly 
identified (Fig. 4). Duplicate experiments gave identi- 
cal activity patterns with at least three additional 
clones (C04582. C06582» and C08582) above an arbi- 
trary background. Two of these clones express products 
that did not match human proteins in the database, 
while the third represents a short out-of-frame frag- 
ment. 

DISCUSSION 

Structural genomics Is expected to provide a linlc 
between DNA sequence information and protein func- 
tion (Gaasterland, 1998; Kim. 1998; Rost. 1998). This 
requires the expression and characterization of large 
numbers of human proteins. We have shown that de- 
sired protein expression clones can be selected at high 
throughput from an arrayed cDNA library using a mul- 
tistep screening procedure. This highly parallel ap- 
proach seems to be an efficient alternative to the sub- 
cloning of individual cDNA sequences. In a first step, 
high-density protein filters are screened for putative 
expression clones (Bussow et ah, 1998; Lueking et a/., 
1999). Those clones are then rearrayed into a subli- 
brary enriched for in-frame inserts. DNA sequence 
analysis of 93 randomly chosen clones of the hExl 
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Clone 

FIG. 4. GAPDH activity screen. Screening for GAPDH activity In bacterial lysates of 96 dones of the hExI library; clone numbers and 
labeling as in Fig. 1: positive control. MPMGpSOODZlS containing a GAPDH cDNA insert. 



library showed that among the known genes, two- 
thirds (66%) expressed their inserts in the correct read- 
ing frame, reflecting the general efficiency of the 
screening method. The two-thirds (64%) of clones con- 
taining Idhe complete coding sequence of a human pro- 
tein do of course reflect the usual bias toward smaller 
products of cDNA libraries (Table 1). 

In a second step, proteins are expressed in microtiter 
plates, and SDS-PAGE, nickel chelate affinity purifi- 
cation, and MALDI-TOF-MS are used to identify the 
best protein-expressing clones. DNA sequencing was 
used to correlate reading frames with protein sizes 
determined by SDS-PAGE and MALDI-TOF-MS. Al- 
though both methods returned compatible results, the 
latter was found to give the most accurate indication 
for the reading frame of cDNA inserts, because nearly 
all clones with expression products of at least 20 kOa 
had inserts in the correct reading frame (Fig. 3). This 
indicates that MALDI-TOF-MS size selecUon is a use- 
ful criterion for confirmation of expression clones. The 
molecular masses determined by MALDI-TOF-MS are 
in good agreement with the values predicted from the 
corresponding DNA sequences (Table 1). This reflects 
the expected experimental mass accuracy considering 
that all samples were analyzed using identical instru- 
mental settings and the same external calibration con- 
stants. These examples demonstrate the power of 
MALDI-TOF-MS for characterizing cDNA expression 
products at high throughput. The detection is sensitive 
(midfemtomolar range), is rapid (<1 min per sample), 
and provides detailed and accurate information about 



the status and purity of the expression products. Ques- 
tions as to whether a certain clone produces high- 
quality protein for X-ray or NMR analysis or whether 
size exclusion chromatography following affinity puri- 
fication can provide the necessary purity and homoge- 
neity can be addressed early and at low cost. MALDI- 
TOF-MS analysis of whole libraries will provide a 
catalogue of expression clones for large numbers of 
human proteins. By including tryptic peptide mass 
fingerprinting, new clones will be directly identified by 
database comparison (Pappin era/., 1993). 

In a third step, the microtiter plate technology was 
extended to functional screening. A spectrophotometric 
enzyme assay was developed that detects GAPDH ex- 
pression clones among 96 hExl library clones, using 
nondenaturing bacterial lysates. It is expected that 
this kind of assay can be adapted to screen expression 
libraries for other biological activities In the microtiter 
plate format. Only 1/1 OOO of the bacterial lysate was 
used for the GAPDH assay. Therefore, the amount of 
protein is not expected to be limiting in future assays. 
Even if the bulk of a protein of Interest is expressed in 
insoluble form, a small soluble fraction could be suffi- 
cient for detection. If necessary, affinity purification in 
microtiter plates can be used to reduce the background 
of E. coli proteins. 

Up-scaling of the assay from 96 clones to the whole 
library will enable the detection of more clones with 
low to medium activity. This might include a certain 
degree of false-positive background but will also detect 
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new biological activity of yet uncharacterized human 
proteins. 

In summary, the use of robot technology for handling 
and arraying of cDNA libraries, in combination with 
high-throughput microtiter plate techniques and 
MALDI-TOF-MS for the analysis of gene products, 
enables the generation of a catalogue of expression 
clones as a tool for the characterization of the human 
proteome. 
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